of recommends an easy way to read and write binary files using Python

  • 2020-05-10 18:28:57
  • OfStack

In general, python itself does not support base 2, but it does provide one module to compensate, the struct module.

python doesn't have a binary type, but you can store data of a binary type, which is the string string type, which doesn't matter, because string is in bytes.

import struct

a=12.34

# changes a to base 2

bytes=struct.pack('i',a)

In this case, bytes is just an string string, which stores the same bytes as a in base 2.

I'm going to do the reverse

Existing base 2 data bytes, (actually a string), convert it back to the data type python:

a,=struct.unpack('i',bytes)

Notice that unpack returns tuple

So if there's only one variable:

bytes=struct.pack('i',a)

So, you need to do this when you're decoding

a, = struct unpack (' i 'bytes) or (a,) = struct. unpack (' i' bytes)

If you directly use a= struct.unpack ('i',bytes), then a=(12.34,) is an tuple instead of the original floating point number.

If it is composed of multiple data, it can be as follows:


a='hello'

b='world!'

c=2

d=45.123

bytes=struct.pack('5s6sif',a,b,c,d)

Now, bytes is data in base 2, so you can just write it to a file like binfile.write (bytes)

And then, we can read it again when we need it, bytes= binfile.read ()

struct.unpack () is then decoded into the python variable

a,b,c,d=struct.unpack('5s6sif',bytes)

'5s6sif' this is called fmt, which is a formatted string made up of Numbers and characters, 5s for a string of five characters, 2i for two integers, etc. Here are the available characters and types, ctype for a type 11 in python.

Format C Type Python 字节数
x pad byte no value 1
c char string of length 1 1
b signed char integer 1
B unsigned char integer 1
? _Bool bool 1
h short integer 2
H unsigned short integer 2
i int integer 4
I unsigned int integer or long 4
l long integer 4
L unsigned long long 4
q long long long 8
Q unsigned long long long 8
f float float 4
d double float 8
s char[] string 1
p char[] string 1
P void * long

The last one, which can be used to represent a pointer type, takes four bytes

In order to exchange data with structures in c, consider also that some c or c++ compilers use byte alignment, usually in 32-bit systems of four bytes, and so provide

Character Byte order Size and alignment
@ native native            凑够4个字节
= native standard        按原字节数
< little-endian standard        按原字节数
> big-endian standard       按原字节数
! network (= big-endian) standard       按原字节数

Use it in the first location of fmt, like '@5s6sif'

-- problems encountered in the processing of base 2 files

When we use processing 2 files, we need to use the following method

binfile=open(filepath,'rb')       read 2-base files

or

binfile=open(filepath,'wb')       writes to base 2 files

So what's the difference between binfile=open(filepath,'r') and filepath?

There are two differences:

First, when you use 'r' if you encounter '0x1A', it's considered the end of the file, so that's EOF. 'rb' does not have this problem. That is, if you write in base 2 and then read in text, if there is '0X1A', you will only read part 1 of the file. Using 'rb' will read 1 straight to the end of the file.

Second, for the string x='abc/ndef', we can use len(x) to get a length of 7, /n which we call a newline, but actually '0X0A'. When we write 'w' as text, '0X0A' will be automatically changed to two characters '0X0D' on the windows platform, '0X0A', '0X0A'. When read in 'r' text, it is automatically converted to the original newline character. If you write 'wb' in base 2, you will keep 1 character unchanged and read it as it is. So if you're writing in text and you're reading in base 2, you're going to have to think about that extra byte. '0X0D' is also known as carriage return.
linux doesn't change. Because linux only USES '0X0A' for newlines.


Related articles: