Python fetching Discuz! Username script code
- 2020-04-02 13:20:44
- OfStack
Recently learned Python, so I wrote a grab Discuz in Python! Username script, very little code but very messy. The idea is simply to match the title regularly and then extract the user name and write it to a text document. Program to baidu webmaster community as an example (a total of more than 400,000 users), hanging on the VPS did not tube, although with a delay but later found that a total of only grabbed more than 50000 user names were blocked...
The code is as follows:
# -*- coding: utf-8 -*-
# Author: As soon as
# Blog: http://www.90blog.org
# Version: 1.0
# function : Python Grab baidu webmaster platform user name script
import urllib
import urllib2
import re
import time
def BiduSpider():
pattern = re.compile(r'<title>(.*) Personal data Baidu webmaster community </title>')
uid=1
thedatas = []
while uid <400000:
theUrl = "http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid="+str(uid)
uid +=1
theResponse = urllib2.urlopen(theUrl)
thePage = theResponse.read()
# Regular matching of user names
theFindall = re.findall(pattern,thePage)
# Waiting for the 0.5 Seconds in case frequent access is disabled
time.sleep(0.5)
if theFindall :
# Chinese coding to prevent garbled output
thedatas = theFindall[0].decode('utf-8').encode('gbk')
# write txt A text document
f = open('theUid.txt','a')
f.writelines(thedatas+'n')
f.close()
if __name__ == '__main__':
BiduSpider()
The final result is as follows:
< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201312/20131230171927104.png" >