Detailed Explanation of Translating python Binary File

  • 2021-07-09 08:34:07
  • OfStack

First import the required package: import struct

struct has the following main functions:


#  In the given format (fmt) Encapsulate data into strings ( Is actually similar to c Byte stream of structure )
pack(fmt, v1, v2, ...)
#  In the given format (fmt) Parse byte stream string Returns the parsed tuple
unpack(fmt, string) 
#  Calculate the given format (fmt) How many bytes of memory are occupied 
calcsize(fmt)

For example, I need to read a file named filename that holds a floating-point number in the shape [100,1025]. The following methods can be adopted


import numpy as np
import struct 
#  Load test data 
f = open('filename','rb')
# 102500 Is the number of numbers contained in the document, and 1 Floating-point numbers account for 4 Bytes 
data_raw = struct.unpack('f'*102500,f.read(4*102500))
f.close()
verify_data = np.asarray(verify_data_raw).reshape(-1,1025)

Similarly, if you want to change binary to double type:


import numpy as np
import struct
f = open('data8.dat','rb')
d_str = f.read()
f.close()
d_len = len(d_str)
d_len2 = d_len//8
# Sometimes you also need to consider the byte order, such as big-endian The above statement is changed to  data = struct.unpack('>'+str(d_len/4)+'f',d_str)
data = struct.unpack(d_len2*'d',d_str)

Note: The number before fmt must be int type, i.e. int* 'd' otherwise the error 'can' t multiply sequence by non-int of type 'float' will be reported

Please refer to the official document of struct for specific function details

The fmt representation is provided below

FORMAT PYTHON TYPE STANDARD SIZE
x no value
c string of length 1 1
b integer 1
B integer 1
? bool 1
h integer 2
H integer 2
i integer 4
I integer 4
l integer 4
L integer 4
q integer 8
Q integer 8
f float 4
d float 8
s string
p string
P integer

In order to exchange data with structures in c, consider that some c or c + + compilers use byte alignment, which is usually a 32-bit system in units of 4 bytes, so struct converts according to the local machine byte order. The alignment can be changed with the first character in the format. Definition is as follows:

CHARACTER BYTE ORDER SIZE ALIGNMENT
@ native native native
= native standard none
< little-endian standard none
> big-endian standard none
! network (= big-endian) standard none

< Little-Endian means that the lower bytes are arranged at the lower address end of memory (top of stack) and the higher bytes are arranged at the higher address end of memory (bottom of stack)

> Big-Endian means that high bytes are placed on the low address side of memory, and low bytes are placed on the high address side of memory.

! Network byte order: TCP/IP protocol defines byte order as Big-Endian, so the byte order used in TCP/IP protocol is usually called network byte order.


Related articles: