python pickle method for storing and reading large data list and dictionary data
- 2021-07-10 20:03:11
- OfStack
First, we will introduce python pickle storage, reading large data list and dictionary data
For lists and dictionaries with a large amount of data, they can be processed into data packets to call, and the file size can be reduced
# List
# Storage
list1 = [123,'xiaopingguo',54,[90,78]]
list_file = open('list1.pickle','wb')
pickle.dump(list1,list_file)
list_file.close()
# Read
list_file = open('list1.pickle','rb')
list2 = pickle.load(list_file)
print(list2)
# Dictionary
# Storage
list3 = {'12': 123, '23': 'xiaopingguo', '34': 54, '45': [90, 78]}
list3_file = open('list3.pickle', 'wb')
pickle.dump(list3, list3_file)
list3_file.close()
# # Read
list3_file = open('list3.pickle', 'rb')
list3 = pickle.load(list3_file)
print(list3)
print(list3['23']
ps: Let's take a look at python using pickle to store big data
Recently, I was processing a piece of data, and there was a huge, typical intermediate variable. Because it will be used permanently and at high frequency in the future, I considered saving it in a data format similar to matlab. mat, which is convenient to read at any time in the future.
It is natural to think of using pickle to store data, because this is the most common and simplest way to store data in python environment.
There are many ways to store data in python, the most commonly used way is to use pickle module, and of course there are other ways, such as saving in json, txt and other formats. As for
Another explanation of pandas, h5, etc. ~
Introduction of pickle Module
The pickle module implements a binary protocol for serializing and deserializing the python object structure. The serialization operation "pickling" converts the python object hierarchy to a byte stream, and the deserialization operation "unpickling" converts the byte stream back to the object hierarchy.
It must be mentioned that pickle is unique to python, so non-python programs may not be able to reconstruct pickle objects. In my work, I encountered a problem, that is, after the machine learning model I trained with sklearn was saved with pickle, my engineering colleagues could not call this model with java. A temporary method was that a colleague read the source code of pickle and deserialized it step by step with java1. I admire it.
Skill of pickle
For the simplest code, the dump () and load () functions are sufficient.
import pickle
a = 1
# Save
with open('data.pickle', 'wb') as f:
pickle.dump(data, f)
# Read
with open('data.pickle', 'rb') as f:
b = pickle.load(f)
But if you read the documentation for pickle, you will find that there is a parameter called protocol. The parameter protocol represents the serialization mode (pickle protocol), which defaults to 0 in python2. X and 3 in python3. X. In short, different versions of python correspond to different highest protocols, and the higher the protocol value, the higher the protocol version used. As shown in the figure,
So what will be the impact of modifying protocol? The larger the protocol value, the faster the dump is, and the more data types it supports, the smaller the saved files take up, and it also brings about one other optimization. For example, in python 3.4, the new protocol version 4 supports serialization of very large data. Therefore, if possible, please select the highest protocol version as the value of protocol parameter, that is, set
protocol=pickle.HIGHEST_PROTOCOL
That's enough.
Then, the above code can be changed to:
import pickle
a = 1
# Save
with open('data.pickle', 'wb') as f:
pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
# Read
with open('data.pickle', 'rb') as f:
b = pickle.load(f)
Perhaps, for small data, the impact will not be great.
But when you need to serialize big data, please remember this skill of pickle.
Summarize
Above is this site to introduce to you python pickle storage, read large data list, dictionary data method, this article introduces to you very detailed, with a reference value, the need for friends to refer to it