Python makes web traffic tools
- 2020-05-30 20:27:38
- OfStack
To prepare
Required environment:
Python3
start
First, implement a simple version, directly on the code:
import urllib.request
import urllib.error
# create get methods
def get(url):
code=urllib.request.urlopen(url).code
return code
if __name__ == '__main__':
# Set up the 1 Some basic properties
url = "http://shua.ofstack.com"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
headers = {'User-Agent':user_agent}
req = urllib.request.Request(url, headers=headers)
# Record the number of
i = 1
while 1:
code = get(url)
print(' access :'+str(code))
i = i+1
Simple and rough, only pv,ip has not changed, easy to be found by search engines, let's improve 1
Add proxy functionality
Add the following code to the get method:
random_proxy = random.choice(proxies)
proxy_support = urllib.request.ProxyHandler({"http":random_proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
Modify 1 the following main method:
if __name__ == '__main__':
url = "http://shua.ofstack.com"
# Add the list of agents, you can go to baidu to get
proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
headers = {'User-Agent':user_agent}
req = urllib.request.Request(url, headers=headers)
i = 1
while 1:
# Add parameters
code = get(url,proxies)
print(' The first '+str(i)+' Secondary proxy access :'+str(code))
i = i+1
That's pretty much it, but there's an bug, and if the page doesn't open or the agent doesn't work, the program automatically ends, so let's add exception handling
Exception handling
Define the mail method for sending email reminders
def mail(txt):
_user = " Your account "
_pwd = " Your password "
_to = " Receipt account "
msg = MIMEText(txt, 'plain', 'utf-8')
# The title
msg["Subject"] = " Proxy is invalid! "
msg["From"] = _user
msg["To"] = _to
try:
# I used it here qq email
s = smtplib.SMTP_SSL("smtp.qq.com", 465)
s.login(_user, _pwd)
s.sendmail(_user, _to, msg.as_string())
s.quit()
print("Success!")
except smtplib.SMTPException as e:
print("Falied,%s" % e)
Then we modify 1 the main method:
if __name__ == '__main__':
url = "http://shua.ofstack.com"
proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
headers = {'User-Agent':user_agent}
req = urllib.request.Request(url, headers=headers)
i = 1
while 1:
try:
code = get(url,proxies)
print(' The first '+str(i)+' Secondary proxy access :'+str(code))
i = i+1
except urllib.error.HTTPError as e:
print(e.code)
# add mail methods
mail(e.code)
except urllib.error.URLError as err:
print(err.reason)
# add mail methods
mail(err.reason)
Done!
conclusion
The code is only 50 lines, the program can be improved:
For example: proxy list automatically fetch, add interface, extend under multithreading and so on
Finally, I would like to share with you the works of one other friend
import urllib2
import timeit
import thread
import time
i = 0
mylock = thread.allocate_lock()
def test(no,r):
global i
url = 'http://blog.csdn.net'
for j in range(1,r):
req=urllib2.Request(url)
req.add_header("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)")
file = urllib2.urlopen(req)
print file.getcode();
mylock.acquire()
i+=1
mylock.release()
print i;
thread.exit_thread()
def fast():
thread.start_new_thread(test,(1,50))
thread.start_new_thread(test,(2,50))
fast()
time.sleep(15)
After testing, servers with more than two threads will have a 503 error, so two threads are just right