win and linux systems in python requests installation

  • 2020-05-17 05:48:08
  • OfStack

On the windows system, you only need to enter the command pip install requests to install it.

On the linux system, you only need to enter the command sudo pip install requests to install it.

or

=================

Window

1. Download requests by the wall

Open the url, http: / / www lfd. uci. edu / ~ gohlke pythonlibs on this website have many third party libraries, python ctrl + f find requests download

Once the.whl file is downloaded, change the suffix from.whl to.zip, then unzip the file and you get two folders

Copy the requests folder to the lib directory under the python installation directory

requests has been installed. Enter import requests to see if the installation is successful.

No error was reported for import requests, indicating that requests has been successfully installed.

2. Quick guide

2.1 send request
It's easy to send the request. First, import the requests module:

>>>import requests

Next, let's get a web page, such as the home page of my personal blog:

>>>r = requests.get('http://www.zhidaow.com')

Now, we can use the various methods and functions of r.
In addition, HTTP request and there are many types, such as POST PUT, DELETE, HEAD, OPTIONS. Can be implemented in the same way:


>>> r = requests.post("http://httpbin.org/post")
>>> r = requests.put("http://httpbin.org/put")
>>> r = requests.delete("http://httpbin.org/delete")
>>> r = requests.head("http://httpbin.org/get")
>>> r = requests.options("http://httpbin.org/get")

I haven't looked into it because I haven't used it yet.

2.2 pass parameters in URLs
Sometimes we need to pass parameters in URL. For example, when collecting baidu search results, we have wd parameters (search term) and rn parameters (number of search results). You can manually compose URL.


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'

The scrambled code wd = above is the transcoding form of "zhang yanan". (it looks like the arguments are sorted by first letter.)

2.3 get the response content
You can get the content of the page through r.text.


>>> r = requests.get('https://www.zhidaow.com')
>>> r.text
u'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

According to the document, requests will automatically transcode the content. Most unicode fonts translate seamlessly. However, I always make UnicodeEncodeError errors when using cygwin, which is frustrating. It is completely normal in python's IDLE.
You can also get the page content via r.content.


>>> r = requests.get('https://www.zhidaow.com')
>>> r.content
b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

The document says r.content is displayed in bytes, so IDLE begins with b. But I didn't use it in cygwin. It's just fine to download the page. Therefore, urllib2.urlopen (url).read () function has been replaced in urllib2. (basically the one feature I use the most.)

2.4 get the page code

You can use r.encoding to get the page code.


>>> r = requests.get('http://www.zhidaow.com')
>>> r.encoding
'utf-8'

When you send a request, requests guesses the page code based on the HTTP header. When you use r.text, requests USES this code. You can also modify the encoding of requests.


>>> r = requests.get('http://www.zhidaow.com')
>>> r.encoding
'utf-8'
>>>r.encoding = 'ISO-8859-1'

As in the example above, changes to encoding will directly use the modified encoding to retrieve the web content.

2.5 json

Like urllib and urllib2, if you use json, you need to introduce new modules, such as json and simplejson, but there is already a built-in function in requests, r.json (). Take API for example:


>>>r = requests.get('http://ip.taobao.com/service/getIpInfo.php?ip=122.88.60.28')
>>>r.json()['data']['country']
' China '

2.6 page status code
We can use r.status_code to check the status code of the page.


>>>r = requests.get('http://www.mengtiankong.com')
>>>r.status_code
200
>>>r = requests.get('http://www.mengtiankong.com/123123/')
>>>r.status_code
404
>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')
>>>r.url
u'http://www.zhidaow.com/
>>>r.status_code
200

The first two examples are normal, the ones that open normally return 200, and the ones that don't open normally return 404. But the third is a bit strange, that is baidu search results in the 302 jump address, but the status code is shown as 200, then I used a move to let him show his true colors:


>>>r.history
(<Response [302]>,)

You can see here that he's using a 302 jump. While some might think that you can get the status code of a jump through judgment and regularity, there is a simpler way:


>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)
>>>r.status_code
302

As long as add a parameter allow_redirects, prohibit the jump, will directly appear the jump status code, good? I also used this at the end of the first hand to do a simple access to the status of the web page code small application, the principle is this.

2.7 response header content

You can get the response header content via r.headers.


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
0

You can see that the entire content is returned in the form of a dictionary, and we can also access some of the content.


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
1

2.8 set the timeout time

We can use the timeout property to set the timeout, and once the response is not received, an error will be prompted.

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

2.9 agent access
In order to avoid being blocked IP during collection, agents are often used. requests also has the corresponding proxies attribute.


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
2

If the agent needs an account and password, do this:


proxies = {
 "http": "http://user:pass@10.10.1.10:3128/",
}

2.10 request header content
The request header content can be retrieved using r.request.headers.


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
4

2.11 customize the request header
Camouflage request header is often used when collecting, we can use this method to hide:


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
5

2.12 persistent connection keep-alive

requests's keep-alive is based on urllib3, and the persistent connection within the same session is completely automatic. All requests within the same 1 session will automatically use the appropriate connection.

In other words, requests automatically implements keep-alive without any setup.

3. Simple application

Gets the page return code


>>> payload = {'wd': ' sunny ', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'
6


Related articles: