In depth explanation of json pickle and shelve libraries of python crawling
- 2021-09-20 21:00:13
- OfStack
Preface
When using Python for network programming or crawling for something you are interested in, you can't avoid some problems such as data transmission and access. The file objects and other extended libraries of Python have solved many problems about text and binary data access, such as web page content and pictures & Audio and video and other multimedia content, but these data are basically the final data form storage. Is there any way to store some object data of Python itself, and then load it directly into Python objects when it is used? This paper will explain the commonly used Python object data access and transmission solutions, namely pickle, shelve and json.
The content is relatively basic and simple, but it is also a knowledge point that must be well grasped because its potential application scenarios are too wide
1. pickle
pickle library provides a solution for localized storage of Python objects (all objects), which can be reloaded from these files later. After loading, it is Python standard object data, which can be directly used by Python.
pickle has the following features:
All types of Python objects can be accessed and localized to 1 file Only 1 Python object can be stored in each file1.1 Temporary Conversion
Python objects can be temporarily converted into pickle sequences (stored in variables instead of files) for later loading.
import pickle
a=[1,2,3,4]
# The following will a Convert to pickle Sequence
p_a=pickle.dumps(a)
# The following will pickle Sequence is transformed into Python Object
a=pickle.loads(p_a)
1.2 Perpetual access
The Python object can be stored in a local file, which is convenient to take out and continue to use next time.
import pickle
a=[1,2,3,4]
f=open('file.pkl','wb')
# The following will a Convert to pickle Sequence and store it to a local file
p_a=pickle.dump(a,f)
f=open('file.pkl','rb')
# The following will be stored in the local file pickle Sequence is transformed into Python Object
a=pickle.load(f)
f.close()
Above, first open a file. Note that because the pickle sequence is binary encoding format, the file mode needs to have 'b'
The Python object is then serialized and stored to a local file
You can then reload the data stored in the file as an Python object by loading the file
2. shelve
shelve library is equivalent to the optimization based on pickle, because pickle can only store a single Python object in a single file, and dump and load need to be used for each access, which is cumbersome. Therefore, the library mainly makes the following optimization:
Created a lightweight key-value pair database, which supports storing multiple Python objects in one file Instead of load every time, access to Python data becomes standard dictionary accessHere's the demo code:
import shelve
db=shelve.open('obj_db')
class A:...
a=[1,2,3];b=dict(name='dennis');c=A
db['a']=a
db['b']=b
db['c']=c
db['a']
db['b']
db['c']
db.close()
The above code, first use the open method of shelve to create an db, you can specify the storage address of the db file
It can then be used to store the Python object (any Python object) within the db of the key-value pair, as with Dictionary 1
Then, like dictionary access 1, you can retrieve the previously stored objects. Finally, don't forget to turn off db tactically
If you want to traverse or introspect the memory key value of an db, the db also has keys () and values () methods, and also supports the iterative protocol of Python
Therefore, compared with pickle, it will be much more convenient and powerful
3. json
json is the most widely used data format for network data transmission. It can convert some data objects specified by Python into json string, which is convenient for storage and network transmission, and convert the serialized string of json into Python object again.
The general process is Python → JSON → Python, so CS data transmission and communication can be carried out.
The following is the json and Python data transformation mapping table:
JSON | Python |
---|---|
object | dict |
array | list |
string | str |
number (int) | int |
number (real) | float |
true,false | Ture,False |
null | None |
3.1 Temporary Conversion
You can temporarily convert an Python object to an json string and assign a value to a variable, and then convert it to an Python object
1 is generally used for network transmission, especially for data transmission when interface is called.
import json
mylist=[1,2,3]
mydict={
'name':dennis
}
# Temporary conversion
a=json.dumps(mydict)
b=json.dumps(mylist)
# Will json String re-conversion Python Object
mylist=json.loads(b)
mydict=json.loads(a)
3.2 Perpetual Access
You can convert an Python object to an json string and store it persistently in a local file for subsequent reloading.
import json
mylist=[1,2,3]
mydict={
'name':dennis
}
# Will Python Object to the json String stored in the file Inside
with open('myjson.json','w') as f:
json.dump(mydict,f)
# That will be stored in the file json String is loaded and converted to Python Object
with open('myjson.json','r') as f:
json.load(f)