Python urllib module urlopen of and urlretrieve of

  • 2020-04-02 13:03:33
  • OfStack

1. The urlopen () method
Urllib. Urlopen ([url, data [, proxies]]) : create a said remote url class file object, then do the same as the local file this class file object to obtain the remote data.
The parameter url represents the path of the remote data, usually the url;
The parameter data represents the data submitted to the url by post (those of you who have played with the web should know that there are two ways to submit data: post and get. If you don't know, don't worry, this parameter is rarely used.
Parameters of proxies for setting agent.
Urlopen returns a class file object that provides the following methods:
Read (), readline(), readlines(), fileno(), close() : these methods are used exactly as file objects;
Info () : returns an httplib.httpmessage object representing the header information returned by the remote server
Getcode () : returns an Http status code. If it is an HTTP request, 200 indicates that the request completed successfully. 404 means the address was not found;
Geturl () : returns the requested url;
Code example:


import urllib
url = "http://www.baidu.com/"
#urlopen()
sock = urllib.urlopen(url)
htmlCode = sock.read()
sock.close
fp = open("e:/1.html","wb")
fp.write(htmlCode)
fp.close
#urlretrieve()
urllib.urlretrieve(url, 'e:/2.html')

2. Urlretrieve method
Directly download remote data locally.


urllib.urlretrieve(url[, filename[, reporthook[, data]]])
 Parameter description: 
url : external or local url
filename : specifies the path to be saved locally (if this parameter is not specified, urllib A temporary file is generated to hold the data. 
reporthook : is a callback function that is triggered when the server is connected and the corresponding data block is transferred. We can use this callback function to show the current download progress. 
data : refers to the post Data to the server. This method returns a tuple containing two elements (filename, headers) . filename Represents the path to save to the local, header Represents the response header for the server. 

The following example demonstrates the use of this method. This example will grab the HTML of sina homepage to the local, save it in D:/sina.html file, and display the download progress.

import urllib
def callbackfunc(blocknum, blocksize, totalsize):
    ''' The callback function 
    @blocknum:  Data blocks that have been downloaded 
    @blocksize:  The size of the data block 
    @totalsize:  Size of remote file 
    '''
    percent = 100.0 * blocknum * blocksize / totalsize
    if percent > 100:
        percent = 100
    print "%.2f%%"% percent
url = 'http://www.sina.com.cn'
local = 'd:\sina.html'
urllib.urlretrieve(url, local, callbackfunc)


Related articles: