python pickle method for storing and reading large data list and dictionary data

  • 2021-07-10 20:03:11
  • OfStack

First, we will introduce python pickle storage, reading large data list and dictionary data

For lists and dictionaries with a large amount of data, they can be processed into data packets to call, and the file size can be reduced


# List 
  # Storage 
  list1 = [123,'xiaopingguo',54,[90,78]]
  list_file = open('list1.pickle','wb')
  pickle.dump(list1,list_file)
  list_file.close()
 
  # Read 
  list_file = open('list1.pickle','rb')
  list2 = pickle.load(list_file)
  print(list2)
 
# Dictionary 
  # Storage 
  list3 = {'12': 123, '23': 'xiaopingguo', '34': 54, '45': [90, 78]}
  list3_file = open('list3.pickle', 'wb')
  pickle.dump(list3, list3_file)
  list3_file.close()
  # #  Read 
  list3_file = open('list3.pickle', 'rb')
  list3 = pickle.load(list3_file)
  print(list3)

  print(list3['23']

ps: Let's take a look at python using pickle to store big data

Recently, I was processing a piece of data, and there was a huge, typical intermediate variable. Because it will be used permanently and at high frequency in the future, I considered saving it in a data format similar to matlab. mat, which is convenient to read at any time in the future.

It is natural to think of using pickle to store data, because this is the most common and simplest way to store data in python environment.
There are many ways to store data in python, the most commonly used way is to use pickle module, and of course there are other ways, such as saving in json, txt and other formats. As for

Another explanation of pandas, h5, etc. ~

Introduction of pickle Module

The pickle module implements a binary protocol for serializing and deserializing the python object structure. The serialization operation "pickling" converts the python object hierarchy to a byte stream, and the deserialization operation "unpickling" converts the byte stream back to the object hierarchy.

It must be mentioned that pickle is unique to python, so non-python programs may not be able to reconstruct pickle objects. In my work, I encountered a problem, that is, after the machine learning model I trained with sklearn was saved with pickle, my engineering colleagues could not call this model with java. A temporary method was that a colleague read the source code of pickle and deserialized it step by step with java1. I admire it.

Skill of pickle

For the simplest code, the dump () and load () functions are sufficient.


import pickle
a = 1
#  Save 
with open('data.pickle', 'wb') as f:
  pickle.dump(data, f)
#  Read 
with open('data.pickle', 'rb') as f:
  b = pickle.load(f)

But if you read the documentation for pickle, you will find that there is a parameter called protocol. The parameter protocol represents the serialization mode (pickle protocol), which defaults to 0 in python2. X and 3 in python3. X. In short, different versions of python correspond to different highest protocols, and the higher the protocol value, the higher the protocol version used. As shown in the figure,

So what will be the impact of modifying protocol? The larger the protocol value, the faster the dump is, and the more data types it supports, the smaller the saved files take up, and it also brings about one other optimization. For example, in python 3.4, the new protocol version 4 supports serialization of very large data. Therefore, if possible, please select the highest protocol version as the value of protocol parameter, that is, set protocol=pickle.HIGHEST_PROTOCOL That's enough.

Then, the above code can be changed to:


import pickle
a = 1
#  Save 
with open('data.pickle', 'wb') as f:
  pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
#  Read 
with open('data.pickle', 'rb') as f:
  b = pickle.load(f)

Perhaps, for small data, the impact will not be great.

But when you need to serialize big data, please remember this skill of pickle.

Summarize

Above is this site to introduce to you python pickle storage, read large data list, dictionary data method, this article introduces to you very detailed, with a reference value, the need for friends to refer to it


Related articles: