Python fetching Discuz! Username script code

2020-04-02 13:20:44
OfStack

Recently learned Python, so I wrote a grab Discuz in Python! Username script, very little code but very messy. The idea is simply to match the title regularly and then extract the user name and write it to a text document. Program to baidu webmaster community as an example (a total of more than 400,000 users), hanging on the VPS did not tube, although with a delay but later found that a total of only grabbed more than 50000 user names were blocked...
The code is as follows:


# -*- coding: utf-8 -*-
# Author:  As soon as 
# Blog: http://www.90blog.org
# Version: 1.0
#  function : Python Grab baidu webmaster platform user name script 

import urllib
import urllib2  
import re
import time

def BiduSpider():
     pattern = re.compile(r'<title>(.*) Personal data    Baidu webmaster community  </title>')
     uid=1
     thedatas = []
     while uid <400000:
         theUrl = "http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid="+str(uid)
         uid +=1
         theResponse  = urllib2.urlopen(theUrl)
         thePage = theResponse.read()
         # Regular matching of user names 
         theFindall = re.findall(pattern,thePage)
         # Waiting for the 0.5 Seconds in case frequent access is disabled 
         time.sleep(0.5)
         if theFindall :
              # Chinese coding to prevent garbled output 
              thedatas = theFindall[0].decode('utf-8').encode('gbk')
              # write txt A text document 
              f = open('theUid.txt','a')
              f.writelines(thedatas+'n')
              f.close()

if __name__ == '__main__':
     BiduSpider()

The final result is as follows:

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201312/20131230171927104.png" >