Python makes web traffic tools


To prepare

Required environment:

Python3

start

First, implement a simple version, directly on the code:

import urllib.request
import urllib.error
# create get methods
def get(url):
 code=urllib.request.urlopen(url).code
 return code
if __name__ == '__main__':
# Set up the 1 Some basic properties
 url = "http://shua.ofstack.com"
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 # Record the number of
 i = 1
 while 1:
   code = get(url)
   print(' access :'+str(code))
   i = i+1

Simple and rough, only pv,ip has not changed, easy to be found by search engines, let’s improve 1

Add proxy functionality

Add the following code to the get method:

random_proxy = random.choice(proxies)
proxy_support = urllib.request.ProxyHandler({"http":random_proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

Modify 1 the following main method:

if __name__ == '__main__':
 url = "http://shua.ofstack.com"
 # Add the list of agents, you can go to baidu to get
 proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 i = 1
 while 1:
   # Add parameters
   code = get(url,proxies)
   print(' The first '+str(i)+' Secondary proxy access :'+str(code))
   i = i+1

That’s pretty much it, but there’s an bug, and if the page doesn’t open or the agent doesn’t work, the program automatically ends, so let’s add exception handling

Exception handling

Define the mail method for sending email reminders

def mail(txt):
 _user = " Your account "
 _pwd = " Your password "
 _to = " Receipt account "
 msg = MIMEText(txt, 'plain', 'utf-8')
 # The title
 msg["Subject"] = " Proxy is invalid! "
 msg["From"] = _user
 msg["To"] = _to

 try:
   # I used it here qq email
   s = smtplib.SMTP_SSL("smtp.qq.com", 465)
   s.login(_user, _pwd)
   s.sendmail(_user, _to, msg.as_string())
   s.quit()
   print("Success!")

 except smtplib.SMTPException as e:
   print("Falied,%s" % e)

Then we modify 1 the main method:

if __name__ == '__main__':
 url = "http://shua.ofstack.com"
 proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 i = 1
 while 1:
   try:
     code = get(url,proxies)
     print(' The first '+str(i)+' Secondary proxy access :'+str(code))
     i = i+1
   except urllib.error.HTTPError as e:
     print(e.code)
      # add mail methods
     mail(e.code)
   except urllib.error.URLError as err:
     print(err.reason)
      # add mail methods
     mail(err.reason)

Done!

conclusion

The code is only 50 lines, the program can be improved:

For example: proxy list automatically fetch, add interface, extend under multithreading and so on

Finally, I would like to share with you the works of one other friend

import urllib2
import timeit
import thread
import time
i = 0
mylock = thread.allocate_lock()
def test(no,r):
  global i
  url = 'http://blog.csdn.net'
  for j in range(1,r):
    req=urllib2.Request(url)
    req.add_header("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)")
    file = urllib2.urlopen(req)
    print file.getcode();
    mylock.acquire()
    i+=1
    mylock.release()
    print i;
  thread.exit_thread()

def fast():
    thread.start_new_thread(test,(1,50))
    thread.start_new_thread(test,(2,50))

fast()
time.sleep(15)

After testing, servers with more than two threads will have a 503 error, so two threads are just right