Python USES rsa encryption algorithm module to simulate sina weibo bodenlu

  • 2020-04-02 13:22:03
  • OfStack

When the PC logs in to sina weibo, the user name and password are encrypted in advance with js on the client side, and a set of parameters will be got before POST, which will also be part of POST_DATA. This makes it impossible to simulate a POST login in the usual simple way (such as renren).

To obtain sina weibo data through crawler, it is necessary to simulate login.

1. Before submitting the POST request, it is necessary to GET four parameters (servertime, nonce, pubkey and rsakv), which is not just to GET the simple servertime, nonce as mentioned before. This is mainly because js has changed the encryption method of user name and password.

1.1 due to the change of encryption method, we will use the RSA module here, and the introduction of RSA public key encryption algorithm can refer to the relevant contents in the network. Download and install the rsa module:

Download: https://pypi.python.org/pypi/rsa/3.1.1

Rsa module documentation address: http://stuvel.eu/files/python-rsa-doc/index.html

Select the appropriate rsa installation package (.egg) according to your own Python version, and install it in win by using easy_install.exe (download setuptool from here: setuptools-0.6c11.win32-py2.6.exe). For example: easy_install rsa-3.1.1-py2.6.egg.

1.2 obtain and view the js file of bodenrou on sina weibo

See the sina passes the url (http://login.sina.com.cn/signup/signin.php) of the source code, which can find the js address, http://login.sina.com.cn/js/sso/ssologin.js, but on the inside content is encrypted, can on the Internet to find a site online decryption decryption, check the end user name and password encryption.

1.3 the login

The first step of login is to add your own username and request the prelogin_url link address:

Prelogin_url = 'http://login.sina.com.cn/sso/prelogin.php? Entry. = sso&callback = sinaSSOController preloginCallBack&su s&rsakt = mod&client = = % ssologin. Js (v1.4.4) '% username

Use the get method to get the following similar content:

SinaSSOController. PreloginCallBack ({" retcode ": 0," servertime ": 1362041092," pcid ":" c3dea2bfdaa3c94e8734c9ec2c9e6a1f gz - 6664 ", "nonce" : "IRYP4N", "pubkey" : "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245A 87 ac253062882729293e5506350508e7f9aa3bb77f4333231490f915f6d63c55fe2f08a49b353f444ad3993cacc02db784abbb8e42a9b1bbfffb38be18d78e87a0e41b9b8f73a928ee0ccee1f6739884b9777e4fe9e88a1bbe495927ac4a799b3181d644 2443 ", "rsakv" : "1330428213", "exectime" : 1})

We then extract the servertime, nonce, pubkey, and rsakv that we want. Of course, the values of pubkey and rsakv we can write in code, they're fixed values.

 

2. Username was calculated by BASE64 before:


username_ = urllib.quote(username)
username = base64.encodestring(username)[:-1]

The password is encrypted three times with SHA1 and the values of servertime and nonce are added to interfere. That is, after SHA1 is encrypted twice, the result is added to the values of servertime and nonce, and SHA1 is calculated again.

In the latest rsa encryption method, username is handled the same as before;

The password is a little different than before:

2.1 first create an rsa public key, the two parameters of the public key sina weibo are given a fixed value, but are given hexadecimal strings, the first is the pubkey in the first step of login, the second is the js encryption file '10001'.

These two values need to be converted from hexadecimal to hexadecimal, but they can also be written in code. I'm just going to write 10001 as 65537. The code is as follows:



rsaPublickey = int(pubkey, 16)
key = rsa.PublicKey(rsaPublickey, 65537) # Create a public key 
message = str(servertime) + 't' + str(nonce) + 'n' + str(password) # Stitching plaintext js Encrypt the file to get 
passwd = rsa.encrypt(message, key) # encryption 
passwd = binascii.b2a_hex(passwd) # Converts the encrypted message to 16 Into the system. 

2.2 request pass url: login_url = "HTTP: / / http://login.sina.com.cn/sso/login.php? The client = ssologin. Js (v1.4.4) '

Header information that needs to be sent


postPara = {
        'entry': 'weibo',
        'gateway': '1',
        'from': '',
        'savestate': '7',
        'userticket': '1',
        'ssosimplelogin': '1',
        'vsnf': '1',
        'vsnval': '',
        'su': encodedUserName,
        'service': 'miniblog',
        'servertime': serverTime,
        'nonce': nonce,
        'pwencode': 'rsa2',
        'sp': encodedPassWord,
        'encoding': 'UTF-8',
        'prelt': '115',
        'rsakv' : rsakv,
        'url': 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack',
        'returntype': 'META'
    }

Rsakv is added to the content of the request, and the value of pwencode is changed to rsa2. The rest is the same as before.

Organize the parameters. POST request. Check whether the login is successful, you can refer to the content of the POST after the 1 location. The replace (" http://weibo.com/ajaxlogin.php? Framelogin = 1 & callback = parent. SinaSSOController. FeedBackUrlCallBack&retcode = 101 & reason = B5 C7 C2 C3 BC % % % % % % % % % FB % BB F2 C3 DC CE ED B4 EB C2 % % % % % % F3 ");

If retcode=101, the login failed. The result after successful login is similar, but the retcode value is 0.

3. After successful login, the url in the replace message in the body is the url we will use next. Then use the GET method to the above url to send a request to the server, save the requested Cookie information, which is the login Cookie we need.


Related articles: