In depth explanation of json pickle and shelve libraries of python crawling

  • 2021-09-20 21:00:13
  • OfStack

Preface

When using Python for network programming or crawling for something you are interested in, you can't avoid some problems such as data transmission and access. The file objects and other extended libraries of Python have solved many problems about text and binary data access, such as web page content and pictures & Audio and video and other multimedia content, but these data are basically the final data form storage. Is there any way to store some object data of Python itself, and then load it directly into Python objects when it is used? This paper will explain the commonly used Python object data access and transmission solutions, namely pickle, shelve and json.

The content is relatively basic and simple, but it is also a knowledge point that must be well grasped because its potential application scenarios are too wide

1. pickle

pickle library provides a solution for localized storage of Python objects (all objects), which can be reloaded from these files later. After loading, it is Python standard object data, which can be directly used by Python.

pickle has the following features:

All types of Python objects can be accessed and localized to 1 file Only 1 Python object can be stored in each file

1.1 Temporary Conversion

Python objects can be temporarily converted into pickle sequences (stored in variables instead of files) for later loading.


import pickle
a=[1,2,3,4]
# The following will a Convert to pickle Sequence 
p_a=pickle.dumps(a)
 
# The following will pickle Sequence is transformed into Python Object 
a=pickle.loads(p_a)

1.2 Perpetual access

The Python object can be stored in a local file, which is convenient to take out and continue to use next time.


import pickle
a=[1,2,3,4]
f=open('file.pkl','wb')
 
# The following will a Convert to pickle Sequence and store it to a local file 
p_a=pickle.dump(a,f)
 
f=open('file.pkl','rb')
# The following will be stored in the local file pickle Sequence is transformed into Python Object 
a=pickle.load(f)
 
f.close()

Above, first open a file. Note that because the pickle sequence is binary encoding format, the file mode needs to have 'b'

The Python object is then serialized and stored to a local file

You can then reload the data stored in the file as an Python object by loading the file

2. shelve

shelve library is equivalent to the optimization based on pickle, because pickle can only store a single Python object in a single file, and dump and load need to be used for each access, which is cumbersome. Therefore, the library mainly makes the following optimization:

Created a lightweight key-value pair database, which supports storing multiple Python objects in one file Instead of load every time, access to Python data becomes standard dictionary access

Here's the demo code:


import shelve
db=shelve.open('obj_db')
class A:...
a=[1,2,3];b=dict(name='dennis');c=A
db['a']=a
db['b']=b
db['c']=c
 
db['a']
db['b']
db['c']
 
db.close()

The above code, first use the open method of shelve to create an db, you can specify the storage address of the db file

It can then be used to store the Python object (any Python object) within the db of the key-value pair, as with Dictionary 1

Then, like dictionary access 1, you can retrieve the previously stored objects. Finally, don't forget to turn off db tactically

If you want to traverse or introspect the memory key value of an db, the db also has keys () and values () methods, and also supports the iterative protocol of Python

Therefore, compared with pickle, it will be much more convenient and powerful

3. json

json is the most widely used data format for network data transmission. It can convert some data objects specified by Python into json string, which is convenient for storage and network transmission, and convert the serialized string of json into Python object again.

The general process is Python → JSON → Python, so CS data transmission and communication can be carried out.

The following is the json and Python data transformation mapping table:

JSON Python
object dict
array list
string str
number (int) int
number (real) float
true,false Ture,False
null None

3.1 Temporary Conversion

You can temporarily convert an Python object to an json string and assign a value to a variable, and then convert it to an Python object

1 is generally used for network transmission, especially for data transmission when interface is called.


import json
mylist=[1,2,3]
mydict={
 'name':dennis
}
# Temporary conversion 
a=json.dumps(mydict)
b=json.dumps(mylist)
# Will json String re-conversion Python Object 
mylist=json.loads(b)
mydict=json.loads(a)

3.2 Perpetual Access

You can convert an Python object to an json string and store it persistently in a local file for subsequent reloading.


import json
mylist=[1,2,3]
mydict={
 'name':dennis
}
 
# Will Python Object to the json String stored in the file Inside 
with open('myjson.json','w') as f:
 json.dump(mydict,f)
 
# That will be stored in the file json String is loaded and converted to Python Object 
with open('myjson.json','r') as f:
 json.load(f)

Summarize


Related articles: