Additional notes on the python urllib urlopen of object method and proxy

2020-06-07 04:43:55
OfStack

python urllib urlopen() Object methods/proxies

urllib is an interface that comes with python to grab web page information. Its main method is urlopen(), which is based on python's open() method. The following is the main explanation:


urllib.urlopen(' The url ')

Here the incoming urlopen () said there are special requirements, should follow 1 some network protocols, such as http, ftp, that is to say, at the beginning of a site must have http: / / such specifications, such as: urllib. urlopen (' http: / / www. baidu. com ').

urllib.urlopen ('file: nowamagic.py '). Note that hello.py refers to what the current classpath specifies. If you have any questions about hello.py, it must be python. urllib. urlopen (' file: F: \ pythontest \ nowamagic py ').

urlopen(url='ftp:// Username: password @ftp address /'), etc.

Sample program:


import urllib
f = urllib.urlopen('file:F:\pythontest\nowamagic.py')
a = f.read()
print a

If the parameters passed in are correct, such as the site is accessible, and there are no special circumstances (such as requiring a proxy, being blocked, etc.), then an object similar to a file object will be returned. The f, f object in the above code has some methods 1, some operation methods, using dir(f) :


['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close', 'fileno', 'fp', 'geturl', 'headers', 'info', 'next', 'read', 'readline', 'readlines', 'url']

Using the read() method will read everything out, and at the same time the f object is similar to first-in, first-out data. Using ES71en.read () will not get any data, that is, the resulting data at this point will need to be defined as an object for storage if you want to do any processing later. a in the example above. The info(),geturl() methods are also based on the document object f, so use


>>>f.geturl()
 'F://pythontest//nowamagic.py'

Next is the urllib proxy Settings:


import urllib
proxies = {'http':'http://***.***.***.***:1984'}
filehandle = urllib.urlopen('http://www. A web site that requires a proxy to access .com/',proxies = proxies)
a = filehandle.read()
print a

The above is the most basic agent, that is, the agent visits the site, and can obtain the content of the site. But what about sites that require login, or cookie?

View the source code of urllib:


def urlopen(url, data=None, proxies=None):
  """urlopen(url [, data]) -> open file-like object"""
  global _urlopener
  if proxies is not None:
    opener = FancyURLopener(proxies=proxies)
  elif not _urlopener:
    opener = FancyURLopener()
    _urlopener = opener
  else:
    opener = _urlopener
  if data is None:
    return opener.open(url)
  else:
    return opener.open(url, data)

According to the source code of urllib urlopen above, we can pass in an data parameter, and the data parameter should also be a dictionary, because when using the browser to send data to the server, we send dictionary type data.

One more point is that proxy support was added after python 2.3.

Thank you for reading, I hope to help you, thank you for your support to this site!