Summary of common commands for visiting grab web pages by python
- 2020-05-27 06:32:41
- OfStack
python visits the common command to crawl web pages
Simple crawling of web pages:
import urllib.request
url="http://google.cn/"
response=urllib.request.urlopen(url) # Return file object
page=response.read()
Save URL as a local file directly:
import urllib.request
url="http://google.cn/"
response=urllib.request.urlopen(url) # Return file object
page=response.read()
POST way:
import urllib.parse
import urllib.request
url="http://liuxin-blog.appspot.com/messageboard/add"
values={"content":" The command line issues a web request test "}
data=urllib.parse.urlencode(values)
# Create the request object
req=urllib.request.Request(url,data)
# Get the data returned by the server
response=urllib.request.urlopen(req)
# Process the data
page=response.read()
GET way:
import urllib.parse
import urllib.request
url="http://www.google.cn/webhp"
values={"rls":"ig"}
data=urllib.parse.urlencode(values)
theurl=url+"?"+data
# Create the request object
req=urllib.request.Request(theurl)
# Get the data returned by the server
response=urllib.request.urlopen(req)
# Process the data
page=response.read()
There are two common methods,geturl(),info()
geturl() is set to tell if there is a server-side url redirect, while info() contains information for series 1.
encode() encoding and dencode() decoding will be used in the processing of Chinese problems:
Thank you for reading, I hope to help you, thank you for your support of this site!