Python simulates sina weibo login function of sina weibo crawler
- 2020-04-02 13:17:02
- OfStack
1. Main function (WeiboMain. Py) :
import urllib2
import cookielib
import WeiboEncode
import WeiboSearch
if __name__ == '__main__':
weiboLogin = WeiboLogin(' XXX @gmail.com', ' X x x x ')# Email (account number), password
if weiboLogin.Login() == True:
print " Login successful! "
The first two imports are to load Python's network programming module, and the next import is to load the other two files weiboencode.py and weiboserb. py (described later). The main function creates a new login object and then logs in.
2. WeiboLogin class (WeiboMain. Py) :
class WeiboLogin:
def __init__(self, user, pwd, enableProxy = False):
" Initialize the WeiboLogin . enableProxy Indicates whether the proxy server is used, and is turned off by default "
print "Initializing WeiboLogin..."
self.userName = user
self.passWord = pwd
self.enableProxy = enableProxy
self.serverUrl = "http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=&rsakt=mod&client=ssologin.js(v1.4.11)&_=1379834957683"
self.loginUrl = "http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.11)"
self.postHeader = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0'}
The initialization function defines two key url members: self.serverurl is used for the first step of login (get servertime, nonce, etc.), where the first step essentially contains 1 and 2 of the login process of parsing sina weibo; Self. LoginUrl is used for the second step (POST to the URL after encrypting the user and password, self. PostHeader is the header of POST), which corresponds to 3 of the login process parsing sina weibo. There are three more functions in the class:
def Login(self):
" Log in program "
self.EnableCookie(self.enableProxy)#cookie Or proxy server configuration
serverTime, nonce, pubkey, rsakv = self.GetServerTime()# The first step to landing
postData = WeiboEncode.PostEncode(self.userName, self.passWord, serverTime, nonce, pubkey, rsakv)# Encrypt the user and password
print "Post data length:n", len(postData)
req = urllib2.Request(self.loginUrl, postData, self.postHeader)
print "Posting request..."
result = urllib2.urlopen(req)# Step 2 of login - resolve the sina weibo login process 3
text = result.read()
try:
loginUrl = WeiboSearch.sRedirectData(text)# Parse the result of the relocation
urllib2.urlopen(loginUrl)
except:
print 'Login error!'
return False
print 'Login sucess!'
return True
Self.EnableCookie is used to set cookie and proxy server. There are many free proxy servers on the network. Then make the first step of login, visit the sina server to get serverTime and other information, and then use this information to encrypt the user name and password, build a POST request; The second step is to send the user and password to self.loginurl. After the relocation information is obtained, the URL to which the final jump is obtained is parsed. After opening the URL, the server will automatically write the login information of the user to cookie, and the login is successful.
def EnableCookie(self, enableProxy):
"Enable cookie & proxy (if needed)."
cookiejar = cookielib.LWPCookieJar()# To establish cookie
cookie_support = urllib2.HTTPCookieProcessor(cookiejar)
if enableProxy:
proxy_support = urllib2.ProxyHandler({'http':'http://xxxxx.pac'})# Using the agent
opener = urllib2.build_opener(proxy_support, cookie_support, urllib2.HTTPHandler)
print "Proxy enabled"
else:
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)# build cookie The corresponding opener
The EnableCookie function is simpler
def GetServerTime(self):
"Get server time and nonce, which are used to encode the password"
print "Getting server time and nonce..."
serverData = urllib2.urlopen(self.serverUrl).read()# Get the content
print serverData
try:
serverTime, nonce, pubkey, rsakv = WeiboSearch.sServerData(serverData)# parsed serverTime . nonce Etc.
return serverTime, nonce, pubkey, rsakv
except:
print 'Get server time & nonce error!'
return None
The functions in the WeiboSearch file are used to parse the data from the server and are relatively simple.
3. SServerData function (weibosearch.py) :
import re
import json
def sServerData(serverData):
"Search the server time & nonce from server data"
p = re.compile('((.*))')
jsonData = p.search(serverData).group(1)
data = json.loads(jsonData)
serverTime = str(data['servertime'])
nonce = data['nonce']
pubkey = data['pubkey']#
rsakv = data['rsakv']#
print "Server time is:", serverTime
print "Nonce is:", nonce
return serverTime, nonce, pubkey, rsakv
The parsing process mainly USES regular expressions and JSON, which are easy to understand. In addition, the partial function of parse relocation result in Login is also shown in this file as follows:
def sRedirectData(text):
p = re.compile('location.replace(['"](.*?)['"])')
loginUrl = p.search(text).group(1)
print 'loginUrl:',loginUrl
return loginUrl
4. From the first step to the second step, the user and password should be encrypted.
import urllib
import base64
import rsa
import binascii
def PostEncode(userName, passWord, serverTime, nonce, pubkey, rsakv):
"Used to generate POST data"
encodedUserName = GetUserName(userName)# User name usage base64 encryption
encodedPassWord = get_pwd(passWord, serverTime, nonce, pubkey)# Current password usage rsa encryption
postPara = {
'entry': 'weibo',
'gateway': '1',
'from': '',
'savestate': '7',
'userticket': '1',
'ssosimplelogin': '1',
'vsnf': '1',
'vsnval': '',
'su': encodedUserName,
'service': 'miniblog',
'servertime': serverTime,
'nonce': nonce,
'pwencode': 'rsa2',
'sp': encodedPassWord,
'encoding': 'UTF-8',
'prelt': '115',
'rsakv': rsakv,
'url': 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack',
'returntype': 'META'
}
postData = urllib.urlencode(postPara)# Network coding
return postData
The PostEncode function builds the message body of the POST, requiring the build to get the same information as the actual login. Difficulties in the user name and password encryption:
def GetUserName(userName):
"Used to encode user name"
userNameTemp = urllib.quote(userName)
userNameEncoded = base64.encodestring(userNameTemp)[:-1]
return userNameEncoded
def get_pwd(password, servertime, nonce, pubkey):
rsaPublickey = int(pubkey, 16)
key = rsa.PublicKey(rsaPublickey, 65537) # Create a public key
message = str(servertime) + 't' + str(nonce) + 'n' + str(password) # Stitching plaintext js Encrypt the file to get
passwd = rsa.encrypt(message, key) # encryption
passwd = binascii.b2a_hex(passwd) # Converts the encrypted message to 16 Into the system.
return passwd
Sina login process, the password encryption method is SHA1, now into RSA, may change later, but various encryption algorithms in Python have corresponding implementation, as long as it is found that the encryption method (), the program is relatively easy to achieve.
At this point, the Python simulation login to sina weibo is successful, run the output:
loginUrl: http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&ssosavestate=1390390056&ticket=ST-MzQ4NzQ5NTYyMA==-1387798056-xd-284624BFC19FE242BBAE2C39FB3A8CA8&retcode=0
Login sucess!
If you need to crawl and fetch the information in the microblog, then you just need to add the crawl and fetch and parse modules after the Main function, such as reading the contents of a microblog page:
htmlContent = urllib2.urlopen(myurl).read()# get myurl All the content of the web page (html)
We can design different crawler modules according to different requirements, the code to simulate login is put here.