Python makes web traffic tools

  • 2020-05-30 20:27:38
  • OfStack

To prepare

Required environment:

Python3

start

First, implement a simple version, directly on the code:


import urllib.request
import urllib.error
# create get methods 
def get(url):
 code=urllib.request.urlopen(url).code
 return code
if __name__ == '__main__':
# Set up the 1 Some basic properties 
 url = "http://shua.ofstack.com"
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 # Record the number of 
 i = 1
 while 1: 
   code = get(url)
   print(' access :'+str(code))
   i = i+1

Simple and rough, only pv,ip has not changed, easy to be found by search engines, let's improve 1

Add proxy functionality

Add the following code to the get method:


random_proxy = random.choice(proxies)
proxy_support = urllib.request.ProxyHandler({"http":random_proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)


Modify 1 the following main method:


if __name__ == '__main__':
 url = "http://shua.ofstack.com"
 # Add the list of agents, you can go to baidu to get 
 proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 i = 1
 while 1:
   # Add parameters 
   code = get(url,proxies)
   print(' The first '+str(i)+' Secondary proxy access :'+str(code))
   i = i+1


That's pretty much it, but there's an bug, and if the page doesn't open or the agent doesn't work, the program automatically ends, so let's add exception handling

Exception handling

Define the mail method for sending email reminders


def mail(txt):
 _user = " Your account "
 _pwd = " Your password "
 _to = " Receipt account "
 msg = MIMEText(txt, 'plain', 'utf-8')
 # The title 
 msg["Subject"] = " Proxy is invalid! "
 msg["From"] = _user
 msg["To"] = _to

 try:
   # I used it here qq email 
   s = smtplib.SMTP_SSL("smtp.qq.com", 465)
   s.login(_user, _pwd)
   s.sendmail(_user, _to, msg.as_string())
   s.quit()
   print("Success!")

 except smtplib.SMTPException as e:
   print("Falied,%s" % e)


Then we modify 1 the main method:


if __name__ == '__main__':
 url = "http://shua.ofstack.com"
 proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"]
 user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
 headers = {'User-Agent':user_agent}
 req = urllib.request.Request(url, headers=headers)
 i = 1
 while 1:
   try:
     code = get(url,proxies)
     print(' The first '+str(i)+' Secondary proxy access :'+str(code))
     i = i+1
   except urllib.error.HTTPError as e:
     print(e.code)
      # add mail methods 
     mail(e.code)
   except urllib.error.URLError as err:
     print(err.reason)
      # add mail methods 
     mail(err.reason)


Done!

conclusion

The code is only 50 lines, the program can be improved:

For example: proxy list automatically fetch, add interface, extend under multithreading and so on

Finally, I would like to share with you the works of one other friend


import urllib2
import timeit
import thread 
import time
i = 0
mylock = thread.allocate_lock()
def test(no,r):
  global i
  url = 'http://blog.csdn.net'
  for j in range(1,r):
    req=urllib2.Request(url) 
    req.add_header("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)") 
    file = urllib2.urlopen(req)
    print file.getcode();
    mylock.acquire()
    i+=1
    mylock.release()  
    print i;
  thread.exit_thread()

def fast():
    thread.start_new_thread(test,(1,50))
    thread.start_new_thread(test,(2,50)) 

fast()
time.sleep(15)

After testing, servers with more than two threads will have a 503 error, so two threads are just right


Related articles: