Python3 implements a concurrent method to verify the proxy pool address

  • 2020-05-12 02:49:16
  • OfStack

This article illustrates an example of how Python3 implements concurrent validation of the proxy pool address. I will share it with you for your reference as follows:


#encoding=utf-8
#author: walker
#date: 2016-04-14
#summary:  Using coroutines / Thread pool concurrency validates the proxy 
import os, sys, time
import requests
from concurrent import futures
cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__))
Headers = {
      'Accept': '*/*',
      'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)',
    }
# Verify the effectiveness of a single agent 
# If valid, return the proxy ; Otherwise, an empty string is returned 
def Check(desturl, proxy, feature):
  proxies = {'http': 'http://' + proxy}
  r = None # The statement 
  exMsg = None
  try:
    r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3)
  except:
    exMsg = '* ' + traceback.format_exc()
    #print(exMsg)
  finally:
    if 'r' in locals() and r:
      r.close()
  if exMsg:
    return ''
  if r.status_code != 200:
    return ''
  if r.text.find(feature) < 0:
    return ''
  return proxy
# Enter the list of agents (set/list) , returns a list of valid agents 
def GetValidProxyPool(rawProxyPool, desturl, feature):
  validProxyList = list()  # List of valid agents 
  pool = futures.ThreadPoolExecutor(8)
  futureList = list()
  for proxy in rawProxyPool:
    futureList.append(pool.submit(Check, desturl, proxy, feature))
  print('\n submit done, waiting for responses\n')
  for future in futures.as_completed(futureList):
    proxy = future.result()
    print('proxy:' + proxy)
    if proxy: # Effective agent 
      validProxyList.append(proxy)
  print('validProxyList size:' + str(len(validProxyList)))
  return validProxyList
# Get the original agent pool 
def GetRawProxyPool():
  rawProxyPool = set()
  # Get the original proxy pool somehow ......
  return rawProxyPool
if __name__ == "__main__":
  rawProxyPool = GetRawProxyPool()
  desturl = 'http://...'    # The target address to be accessed through the proxy 
  feature = 'xxx'    # The feature code of the target page 
  validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)

More about Python related topics: interested readers to view this site "Python introduction and advanced tutorial", "Python URL skills summary", "Python pictures skills summary", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using skills summary", "Python string skills summary" and "Python file and directory skills summary"

I hope this article is helpful to you Python programming.


Related articles: