Advanced usage of Python third party library Requests library

  • 2020-05-27 05:56:27
  • OfStack

1. Installation of Requests library

Use the pip installation, if you have installed the pip package (1 Python package management tool, do not know can baidu yo), or the integrated environment, for example Python(x,y) Or anaconda, you can install Python's libraries directly using pip.


$ pip install requests

Once the installation is complete, take a look at the following basic methods:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}

Here's a little chestnut:


# Small example 
import requests

r = requests.get('http://www.baidu.com')
print type(r)
print r.status_code
print r.encoding
print r.text
print r.cookies
''' I requested baidu's website address, and then printed out the type, status code and encoding method of the returned result. Cookies The content such as   Output: '''
<class 'requests.models.Response'>
200
UTF-8
<RequestsCookieJar[]>

2. http basic request

The requests library provides all of the basic request methods for http. Such as:


r = requests.post("http://httpbin.org/post")
r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options(http://httpbin.org/get)

Basic GET request


r = requests.get("http://httpbin.org/get")
# If you want to add parameters, you can use  params  Parameters: 
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://httpbin.org/get", params=payload)
print r.url

# Output: http://httpbin.org/get?key2=value2&key1=value1

If you want to request an JSON file, you can use it json() Method parsing. For example, write a file named a.json by yourself, and the content is as follows:


["foo", "bar", {
"foo": "bar"
}]
# Request and resolve using the following procedure: 
import requests
r = requests.get("a.json")
print r.text
print r.json()
''' The results are as follows, where 1 One is the direct output content, and another 1 One way is to take advantage of it  json()  methods   Parse and feel the difference :'''
["foo", "bar", {
"foo": "bar"
}]
[u'foo', u'bar', {u'foo': u'bar'}]

If you want to get the original socket response from the server, you can r.raw . However, it needs to be set in the initial request stream=True .


r = requests.get('https://github.com/timeline.json', stream=True)
r.raw
# The output 
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

This retrieves the original socket content of the page.

If you want to add headers, you can pass the headers parameter:


import requests

payload = {'key1': 'value1', 'key2': 'value2'}
headers = {'content-type': 'application/json'}
r = requests.get("http://httpbin.org/get", params=payload, headers=headers)
print r.url
# through headers Parameters can be added to the request header headers information 

3. Basic POST request

For an POST request, we need to add a parameter to it like 1. Then the most basic parameter transfer method can take advantage of the parameter data.


import requests

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)
print r.text
# The results are as follows: 
{
"args": {}, 
"data": "", 
"files": {}, 
"form": {
"key1": "value1", 
"key2": "value2"
}, 
"headers": {
"Accept": "*/*", 
"Accept-Encoding": "gzip, deflate", 
"Content-Length": "23", 
"Content-Type": "application/x-www-form-urlencoded", 
"Host": "http://httpbin.org", 
"User-Agent": "python-requests/2.9.1"
}, 
"json": null, 
"url": "http://httpbin.org/post"
}

You can see that the parameter is passed successfully, and then the server returns the data we passed.

Sometimes the information we need to send is not in form form, but in JSON format, so we can use it json.dumps() Method to serialize the form data.


import json
import requests

url = 'http://httpbin.org/post'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text

# Operation results: 
{
"args": {}, 
"data": "{\"some\": \"data\"}", 
"files": {}, 
"form": {}, 
"headers": {
"Accept": "*/*", 
"Accept-Encoding": "gzip, deflate", 
"Content-Length": "16", 
"Host": "http://httpbin.org", 
"User-Agent": "python-requests/2.9.1"
}, 
"json": {
"some": "data"
}, 
"url": "http://httpbin.org/post"
}

Through the above method, we can POST JSON data format

If you want to upload a file, just use the file parameter:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
0

In this way, we successfully completed the upload of a file.

requests supports streaming uploads, which allows you to send large data streams or files without having to read them into memory first. To use streaming upload, you only need to provide a class file object for your request body, which is very convenient:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
1

4. Cookies

If cookie is included in a response, then we can use the cookies variable to get:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
2

The above program is just a sample, you can use the cookies variable to get the cookies of the site

In addition, the cookies variable can be used to send cookies information to the server:


import requests

url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
print r.text
# Output: 
'{"cookies": {"cookies_are": "working"}}'

5. Timeout configuration

You can use the timeout variable to configure the maximum request time


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
4

Note: timeout is only valid for the connection process and has nothing to do with the download of the response body.

That is, this time only limits the time of the request. Even if the returned response contains a lot of content, it will take a certain amount of time to download.

6. Session objects

In each of the above requests, each request is effectively a new request. This is the equivalent of each request being opened separately with a different browser. That is, it does not refer to a session, even if the request is for the same url. Such as:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
5

Obviously, this is not in one session, so we can't get cookies, so what if we need to keep a persistent session at one of these sites? Just like browsing taobao in a single browser, jumping between different tabs creates a long conversation.

The solution is as follows:


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
6

cookies has been successfully obtained. This is how to set up a session.

So since the session is a global variable, we can certainly use it for global configuration.


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
7

What if headers is also x-test?


r = s.get('http://httpbin.org/headers', headers={'x-test': 'true'})

# It overrides the global configuration: 
{
"headers": {
"Accept": "*/*", 
"Accept-Encoding": "gzip, deflate", 
"Host": "http://httpbin.org", 
"User-Agent": "python-requests/2.9.1", 
"X-Test": "true"
}
}

What if you don't want one variable in the global configuration? It's easy, just set it to None.


#get Request method 
 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# print get The status code of the request 
 >>> r.status_code
200
# Looking at the data type of the request, you can see yes json Format, utf-8 coding 
 >>> r.headers['content-type']
'application/json; charset=utf8'
 >>> r.encoding
'utf-8'
# Print the requested content 
 >>> r.text
u'{"type":"User"...'
# The output json Format data 
 >>> r.json()
 {u'private_gists': 419, u'total_private_repos': 77, ...}
9

That's the basic usage of the session session.

7. SSL certificate verification

Requests can request to verify the SSL certificate for HTTPS, just like web browser 1. To check the SSL certificate of a host, you can use the verify parameter, because the 12306 certificate was invalid some time ago. Let's test 1:


import requests

r = requests.get('https://kyfw.12306.cn/otn/', verify=True)
print r.text
# Results: 
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

Let's try github:


import requests

r = requests.get('Build software better, together', verify=True)
print r.text

Well, normal request, I'm not going to paste the output because there's too much content.

If we want to skip the certificate validation of 12306 just now, we can set verify to False:


import requests

r = requests.get('https://kyfw.12306.cn/otn/', verify=False)
print r.text

Find it and you can request it normally. verify is True by default, so you need to set this variable manually if you need to.

Agent 8.

If you need to use a proxy, you can configure a single request by providing the proxies parameter for any request method.


# Small example 
import requests

r = requests.get('http://www.baidu.com')
print type(r)
print r.status_code
print r.encoding
print r.text
print r.cookies
''' I requested baidu's website address, and then printed out the type, status code and encoding method of the returned result. Cookies The content such as   Output: '''
<class 'requests.models.Response'>
200
UTF-8
<RequestsCookieJar[]>
3

conclusion


Related articles: