Classification of Python crawler data and summary of json data use

  • 2021-10-16 02:06:58
  • OfStack

Structured Classification of Catalog Data
json data

Structured classification of data

1 Generally speaking, for us, what we need to crawl is the content of a website or an application to extract useful value. Content 1 is generally divided into three parts, structured data, semi-structured data and non-institutional data.
1. Structured data:
Data that can be represented by the structure of Series 1. It can use relational database to express and store data in 2-dimensional form. One general feature is that data is in line units, one row of data represents the information of one entity, and the attributes of each row of data are the same.
2. Semi-structured data:
A form of structured data that does not conform to the data model structure associated in the form of relational databases or other data tables, but contains related tags to separate semantic elements and hierarchy records and fields. Therefore, he is also called a self-describing structure. Common semi-structured data are html, xml and json, which are actually stored in the structure of trees or graphs.
For semi-structured data, the order of attributes in nodes is not important, and the number of attributes varies from semi-structured data to semi-structured data. Such a data format can freely express a lot of useful information, including self-describing information. Therefore, semi-structured data has good scalability and is especially suitable for large-scale dissemination in the Internet.
3. Unstructured data
Is that there is no fixed structure. All kinds of documents, pictures, videos or audio are unstructured data. For this kind of data, we generally store it directly as a whole, and generally store it in binary form.

json data

json (JavaScript Object Notation, JS object tag) is a lightweight data exchange format. Based on a subset of ECMAScript (w specification formulated by w3c), he uses a text format completely independent of programming language to store and represent data. Brief introduction and clear hierarchy make JSON an ideal data exchange language.
Features: Easy to read, easy to machine generation, effectively improve the network speed.
JSON Syntax Rule: In JS, 1 is an object. Therefore, any supported type can be represented by json. Such as strings, numbers, objects, arrays.
Objects and arrays in Js are two special and commonly used types:
1. Objects are represented as key-value pairs {name: 'zhangsan', age: '7'}
2. Data separated by commas [1, 2, 3, 4, 5]
3. Curly braces hold objects
4. Square brackets hold the array.
The object of js is equivalent to the dictionary in python
An array in js is equivalent to a list in Python
Because json is used to store objects or arrays of js, we can convert json to list or dict in Python.

Parsing the package json for json:

json. dumps (list or dict of python)-- > (Return value)--- > json string.
json. loads (json string)-- > (Return value)--- > list or dict of python.

json. dump (list/dict, fp)- > list, or the dictionary to an json file.
json. load (fp)- > list/dict: Read json data from the json file.

The json key-value pair is a way to save js objects, which is similar to the writing page of js objects, such as:
{"firstName": "Json", "Class": "aid1111"} is equivalent to the following js statement: {firstName: "Json", Class: "aid1111"}.
Many people don't know the relationship between json and js objects, or even who is who. In fact, it can be understood as follows: "JSON is a string expression of JS object. It uses text form to represent the information of an JS object, which is essentially a string."
For example, var obj = {a: "hello", b: "World"} This is an js object. Note that the key name is also an var json that can be enclosed in quotation marks = '{"a": "hello", "b": "World"}' This is an json string, essentially a string.
JSON has higher efficiency when transmitting as a data packet format, because JSON does not have strict closed labels like xml, which greatly improves the ratio of effective data volume to total data packets, thus greatly reducing the transmission pressure of the network under the condition of reducing the same data flow.

The above is the Python crawler data classification and json data use summary details, more about Python crawler data classification and json data use information please pay attention to other related articles on this site!


Related articles: