Python login site details and examples

  • 2020-05-30 20:24:04
  • OfStack

Python login site details and examples

For most forums, we want to crawl the post analysis, first need to log in, otherwise can not see.

This is because the HTTP protocol is a stateless (Stateless) protocol. How does the server know if the user currently requesting the connection is logged in? There are two ways:

Use Session ID explicitly in URI; With Cookie, the process is roughly that after logging into a website, one Cookie will be retained locally. When you continue to browse the website, the browser will send Cookie together with the address request 1.

Python provides a fairly rich set of modules, so this kind of network operation can be done in a few sentences. I take logging into the QZZN forum as an example. In fact, the following program is applicable to almost all PHPWind forums.


# -*- coding: GB2312 -*-

from urllib import urlencode
import cookielib, urllib2

# cookie
cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

# Login
user_data = {'pwuser': ' Your username ',
       'pwpwd': ' Your password ',
       'step':'2'
      }
url_data = urlencode(user_data)
login_r = opener.open("http://bbs.qzzn.com/login.php", url_data)

1 some notes:

urllib2 is obviously a 1 point more advanced module than urllib, which includes how to use Cookies. In urllib2, each client can be abstracted with 1 opener, and each opener can be augmented with multiple handler. When constructing opener, HTTPCookieProcessor is specified as handler, so this handler supports Cookie. After using isntall_opener, this opener will be used when calling urlopen. If you do not need to save Cookie, the parameter cj can be omitted. user_data stores the information you need to log in, just pass it along when you log in to the forum. urlencode encodes the dictionary user_data as "? pwuser = username & pwpwd=password". This is done to make the program easier to read.

The last question is where do names like pwuser and pwpwd come from, and that's where the analysis comes from. As we know, the login interface like 1 is a form. The following is an excerpt:


<form action="login.php?" method="post" name="login" onSubmit="this.submit.disabled = true;"> 
<input type="hidden" value="" name="forward" /> 
<input type="hidden" value="http://bbs.qzzn.com/index.php" name="jumpurl" /> 
<input type="hidden" value="2" name="step" /> 
... 
<td width="20%" onclick="document.login.pwuser.focus();"><input type="radio" name="lgt" value="0" checked /> The user name  <input type="radio" name="lgt" value="1" />UID</td> 
<td><input class="input" type="text" maxLength="20" name="pwuser" size="40" tabindex="1" /> <a href="reg1ster.php" rel="external nofollow" > Immediately registered </a></td> 
<td> password </td> 
<td><input class="input" type="password" maxLength="20" name="pwpwd" size="40" tabindex="2" /> <a href="sendpwd.php" rel="external nofollow" target="_blank"> Retrieve password </a></td> 
... 
</form>

As you can see here, the username and password we need to enter corresponds to pwuser and pwpwd, while step corresponds to login (this is an attempt).

Note that this forum form adopts post mode. If it is get mode, the method in this paper needs to be changed by 1, instead of open directly, it should first Request and then open. See the manual for more details...

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: