An article on common serialization operations in Python

  • 2021-06-28 13:34:28
  • OfStack

0x00 marshal

marshal reads and writes Python objects using machine-independent binary that is relevant to the Python language.This binary format is also relevant to the Python language version, and the marshal serialized format is incompatible with different versions of Python.

marshal1 is commonly used for serialization of Python internal objects.

1 Generally includes:

Basic types booleans, integers, floating point numbers, complex numbers Sequence collection types strings, bytes, bytearray, tuple, list, set, frozenset, dictionary code object code object Other types None, Ellipsis, StopIteration

The main function of marshal is to support the read and write of.pyc files "compiled" by Python.This is why marshal is not compatible with the Python version.Developers who want to use serialization/deserialization should use the pickle module.

Common methods


marshal.dump(value, file[, version])

Serialize an object to a file


marshal.dumps(value[, version])

Serialize an object and return an bytes object


marshal.load(file)

Deserialize an object from a file


marshal.loads(bytes)

Deserialize an object from bytes2 binary data

0x01 pickle

The pickle module can also read and write Python objects in binary mode.pickle has a wider range of serialization applications than marshal provides basic serialization capabilities.

pickle serialized data is also related to the Python language, that is, other languages such as Java cannot read binary data serialized by Python through pickle.We should use json if we want to use serialization that is not possible with the language.This will be explained below.

The data types that can be serialized by pickle are:

None, True, and False integers, floating point numbers, complex numbers strings, bytes, bytearrays tuples, lists, sets, and dictionaries, and containing objects that can be serialized by pickle Function object defined at the top level of the module (using the def definition, not the lambda expression) Defining built-in functions at the top level of a module Classes defined at the top level of the pattern _u of a classdict_uContains serializable objects or uES106 EN_u()Method returns an object that can be serialized

PicklingError will be thrown if pickle1 does not support serialization.

Common methods


pickle.dump(obj, file, protocol=None, *, fix_imports=True)

The obj object is serialized into an file file, which is equivalent to Pickler (file, protocol). dump (obj).


pickle.dumps(obj, protocol=None, *, fix_imports=True)

Serialize the obj object into bytes2 binary data.


pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict")

Deserializes an object from an file file, which is equivalent to Unpickler (file). load().


pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict")

From binary data bytes_object deserializes the object.

Serialization example


import pickle

#  Definition 1 A dictionary containing objects that can be serialized 
data = {
 'a': [1, 2.0, 3, 4 + 6j],
 'b': ("character string", b"byte string"),
 'c': {None, True, False}
}

with open('data.pickle', 'wb') as f:
 #  Serialize objects to 1 individual data.pickle File 
 #  Version with serialization format specified pickle.HIGHEST_PROTOCOL
 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

One more data.pickle file in the folder after execution

serialization
- data.pickle
- pickles.py
- unpickles.py

Deserialization example


import pickle

with open('data.pickle', 'rb') as f:
 #  from data.pickle Deserialize objects in files 
 # pickle Ability to automatically detect versions of serialized files 
 #  So you don't need a version number here 
 data = pickle.load(f)

 print(data)

#  Post-Execution Results 
# {'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, True, None}}

0x02 json
json is a language-independent, very common data interaction format.In Python it has a similar API to marshal and pickle1.

Common methods


marshal.dumps(value[, version])
0

Serialize objects into fp files


marshal.dumps(value[, version])
1

Serialize obj into an json object


marshal.dumps(value[, version])
2

Deserialize from a file to an object


json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Deserialize into an object from an json-formatted document

Conversion Table of json and Python Objects

JSON Python
object dict
list,tuple array
str string
int, float, int- & float-derived Enums number
True true
False false
None null

Serialization is well done for basic types, sequences, and collection types json containing basic types.

Serialization example


marshal.dumps(value[, version])
4

Deserialization example


marshal.dumps(value[, version])
5

The case for object is a bit more complicated

For example, an json document that defines a complex complex object

complex_data.json


marshal.dumps(value[, version])
6

To deserialize this json document into an Python object, you need to define the transformation method


marshal.dumps(value[, version])
7

If object_is not specifiedhook, then object in the json document is converted to dict by default


# coding=utf-8
import json

if __name__ == '__main__':

 with open("complex_data.json") as complex_data:
  #  Not specified here object_hook
  z2 = json.loads(complex_data.read())
  print(type(z2))
  print(z2)
#  results of enforcement 
# <class 'dict'>
# {'__complex__': True, 'real': 42, 'imaginary': 36}

You can see that object in the json document has been converted to an dict object.

1 Normally, this seems okay, but if you have a very type-intensive scenario, you need to clearly define the transformation method.

Except object_The hook parameter can also use json.JSONEncoder


marshal.dumps(value[, version])
9

Because the json module does not automatically complete serialization for all types, TypeError is thrown directly for unsupported types.


>>> import datetime
>>> d = datetime.datetime.now()
>>> dct = {'birthday':d,'uid':124,'name':'jack'}
>>> dct
{'birthday': datetime.datetime(2019, 6, 14, 11, 16, 17, 434361), 'uid': 124, 'name': 'jack'}
>>> json.dumps(dct)
Traceback (most recent call last):
 File "<pyshell#19>", line 1, in <module>
 json.dumps(dct)
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 231, in dumps
 return _default_encoder.encode(obj)
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 199, in encode
 chunks = self.iterencode(o, _one_shot=True)
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 257, in iterencode
 return _iterencode(o, 0)
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default
 raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

For types that do not support serialization, such as datetime and custom types, you need to use JSONEncoder to define the logic of the transformation.


import json
import datetime

#  Defining a date type JSONEncoder
class DatetimeEncoder(json.JSONEncoder):

 def default(self, obj):
  if isinstance(obj, datetime.datetime):
   return obj.strftime('%Y-%m-%d %H:%M:%S')
  elif isinstance(obj, datetime.date):
   return obj.strftime('%Y-%m-%d')
  else:
   return json.JSONEncoder.default(self, obj)

if __name__ == '__main__':
 d = datetime.date.today()
 dct = {"birthday": d, "name": "jack"}
 data = json.dumps(dct, cls=DatetimeEncoder)
 print(data)

#  results of enforcement 
# {"birthday": "2019-06-14", "name": "jack"}

Now we want to be able to convert the date format in the json document into an datetime.date object when serializing, so we need to use json.JSONDecoder.


# coding=utf-8
import json
import datetime

#  Definition Decoder analysis json
class DatetimeDecoder(json.JSONDecoder):

 #  Construction method 
 def __init__(self):
  super().__init__(object_hook=self.dict2obj)

 def dict2obj(self, d):
  if isinstance(d, dict):
   for k in d:
    if isinstance(d[k], str):
     #  Parse the date format to generate 1 individual date object 
     dat = d[k].split("-")
     if len(dat) == 3:
      date = datetime.date(int(dat[0]), int(dat[1]), int(dat[2]))
      d[k] = date
  return d

if __name__ == '__main__':
 d = datetime.date.today()
 dct = {"birthday": d, "name": "jack"}
 data = json.dumps(dct, cls=DatetimeEncoder)
 # print(data)

 obj = json.loads(data, cls=DatetimeDecoder)
 print(type(obj))
 print(obj)

#  results of enforcement 
# {"birthday": "2019-06-14", "name": "jack"}
# <class 'dict'>
# {'birthday': datetime.date(2019, 6, 14), 'name': 'jack'}

0x03 Summary 1

Common serializations for Python include marshal, pickle, and json.marshal is mainly used in the.pyc file of Python and is related to the Python version.It cannot serialize user-defined classes.

pickle is a more general serialization tool for Python objects than marshal, which can be compatible with different versions of Python.json is a language-independent data structure that is widely used for data interaction in a variety of network applications, especially in REST API services.

0x04 Learning Materials

docs.python.org/3/library/m... docs.python.org/3/library/p... docs.python.org/3/library/j...

Related articles: