Summary of common commands for visiting grab web pages by python

  • 2020-05-27 06:32:41
  • OfStack

python visits the common command to crawl web pages

Simple crawling of web pages:


import urllib.request  
url="http://google.cn/" 
response=urllib.request.urlopen(url)  # Return file object 
page=response.read() 

Save URL as a local file directly:


import urllib.request  
url="http://google.cn/" 
response=urllib.request.urlopen(url)  # Return file object 
page=response.read() 

POST way:


import urllib.parse 
import urllib.request 
 
url="http://liuxin-blog.appspot.com/messageboard/add" 
 
values={"content":" The command line issues a web request test "} 
data=urllib.parse.urlencode(values) 

# Create the request object  
req=urllib.request.Request(url,data) 
# Get the data returned by the server  
response=urllib.request.urlopen(req) 
# Process the data  
page=response.read() 

GET way:


import urllib.parse 
import urllib.request 
 
url="http://www.google.cn/webhp" 
 
values={"rls":"ig"} 
data=urllib.parse.urlencode(values) 
 
theurl=url+"?"+data 
# Create the request object  
req=urllib.request.Request(theurl) 
# Get the data returned by the server  
response=urllib.request.urlopen(req) 
# Process the data  
page=response.read() 

There are two common methods,geturl(),info()

geturl() is set to tell if there is a server-side url redirect, while info() contains information for series 1.

encode() encoding and dencode() decoding will be used in the processing of Chinese problems:

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: