of recommends an easy way to read and write binary files using Python
- 2020-05-10 18:28:57
- OfStack
In general, python itself does not support base 2, but it does provide one module to compensate, the struct module.
python doesn't have a binary type, but you can store data of a binary type, which is the string string type, which doesn't matter, because string is in bytes.
import struct
a=12.34
# changes a to base 2
bytes=struct.pack('i',a)
In this case, bytes is just an string string, which stores the same bytes as a in base 2.
I'm going to do the reverse
Existing base 2 data bytes, (actually a string), convert it back to the data type python:
a,=struct.unpack('i',bytes)
Notice that unpack returns tuple
So if there's only one variable:
bytes=struct.pack('i',a)
So, you need to do this when you're decoding
a, = struct unpack (' i 'bytes) or (a,) = struct. unpack (' i' bytes)
If you directly use a= struct.unpack ('i',bytes), then a=(12.34,) is an tuple instead of the original floating point number.
If it is composed of multiple data, it can be as follows:
a='hello'
b='world!'
c=2
d=45.123
bytes=struct.pack('5s6sif',a,b,c,d)
Now, bytes is data in base 2, so you can just write it to a file like binfile.write (bytes)
And then, we can read it again when we need it, bytes= binfile.read ()
struct.unpack () is then decoded into the python variable
a,b,c,d=struct.unpack('5s6sif',bytes)
'5s6sif' this is called fmt, which is a formatted string made up of Numbers and characters, 5s for a string of five characters, 2i for two integers, etc. Here are the available characters and types, ctype for a type 11 in python.
Format | C Type | Python | 字节数 |
---|---|---|---|
x | pad byte | no value | 1 |
c | char | string of length 1 | 1 |
b | signed char | integer | 1 |
B | unsigned char | integer | 1 |
? | _Bool | bool | 1 |
h | short | integer | 2 |
H | unsigned short | integer | 2 |
i | int | integer | 4 |
I | unsigned int | integer or long | 4 |
l | long | integer | 4 |
L | unsigned long | long | 4 |
q | long long | long | 8 |
Q | unsigned long long | long | 8 |
f | float | float | 4 |
d | double | float | 8 |
s | char[] | string | 1 |
p | char[] | string | 1 |
P | void * | long |
The last one, which can be used to represent a pointer type, takes four bytes
In order to exchange data with structures in c, consider also that some c or c++ compilers use byte alignment, usually in 32-bit systems of four bytes, and so provide
Character | Byte order | Size and alignment |
---|---|---|
@ | native | native 凑够4个字节 |
= | native | standard 按原字节数 |
< | little-endian | standard 按原字节数 |
> | big-endian | standard 按原字节数 |
! | network (= big-endian) | standard 按原字节数 |
Use it in the first location of fmt, like '@5s6sif'
-- problems encountered in the processing of base 2 files
When we use processing 2 files, we need to use the following method
binfile=open(filepath,'rb') read 2-base files
or
binfile=open(filepath,'wb') writes to base 2 files
So what's the difference between binfile=open(filepath,'r') and filepath?
There are two differences:
First, when you use 'r' if you encounter '0x1A', it's considered the end of the file, so that's EOF. 'rb' does not have this problem. That is, if you write in base 2 and then read in text, if there is '0X1A', you will only read part 1 of the file. Using 'rb' will read 1 straight to the end of the file.
Second, for the string x='abc/ndef', we can use len(x) to get a length of 7, /n which we call a newline, but actually '0X0A'. When we write 'w' as text, '0X0A' will be automatically changed to two characters '0X0D' on the windows platform, '0X0A', '0X0A'. When read in 'r' text, it is automatically converted to the original newline character. If you write 'wb' in base 2, you will keep 1 character unchanged and read it as it is. So if you're writing in text and you're reading in base 2, you're going to have to think about that extra byte. '0X0D' is also known as carriage return.
linux doesn't change. Because linux only USES '0X0A' for newlines.