Python3 implements a concurrent method to verify the proxy pool address
- 2020-05-12 02:49:16
- OfStack
This article illustrates an example of how Python3 implements concurrent validation of the proxy pool address. I will share it with you for your reference as follows:
#encoding=utf-8
#author: walker
#date: 2016-04-14
#summary: Using coroutines / Thread pool concurrency validates the proxy
import os, sys, time
import requests
from concurrent import futures
cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__))
Headers = {
'Accept': '*/*',
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)',
}
# Verify the effectiveness of a single agent
# If valid, return the proxy ; Otherwise, an empty string is returned
def Check(desturl, proxy, feature):
proxies = {'http': 'http://' + proxy}
r = None # The statement
exMsg = None
try:
r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3)
except:
exMsg = '* ' + traceback.format_exc()
#print(exMsg)
finally:
if 'r' in locals() and r:
r.close()
if exMsg:
return ''
if r.status_code != 200:
return ''
if r.text.find(feature) < 0:
return ''
return proxy
# Enter the list of agents (set/list) , returns a list of valid agents
def GetValidProxyPool(rawProxyPool, desturl, feature):
validProxyList = list() # List of valid agents
pool = futures.ThreadPoolExecutor(8)
futureList = list()
for proxy in rawProxyPool:
futureList.append(pool.submit(Check, desturl, proxy, feature))
print('\n submit done, waiting for responses\n')
for future in futures.as_completed(futureList):
proxy = future.result()
print('proxy:' + proxy)
if proxy: # Effective agent
validProxyList.append(proxy)
print('validProxyList size:' + str(len(validProxyList)))
return validProxyList
# Get the original agent pool
def GetRawProxyPool():
rawProxyPool = set()
# Get the original proxy pool somehow ......
return rawProxyPool
if __name__ == "__main__":
rawProxyPool = GetRawProxyPool()
desturl = 'http://...' # The target address to be accessed through the proxy
feature = 'xxx' # The feature code of the target page
validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)
More about Python related topics: interested readers to view this site "Python introduction and advanced tutorial", "Python URL skills summary", "Python pictures skills summary", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using skills summary", "Python string skills summary" and "Python file and directory skills summary"
I hope this article is helpful to you Python programming.