Summary of Four Benefits of python Thread Pool

  • 2021-11-02 01:31:56
  • OfStack

1. Benefits of use

Improve performance: Reuse thread resources because of reducing the cost of a large number of new terminated threads;

Applicable scenario: It is suitable for handling a large number of sudden requests or requiring a large number of threads to complete tasks, but the actual task processing time is short.

Defense function: It can effectively avoid the problem that the system is overloaded and slows down due to too many threads.

Code Advantage: Using thread pool syntax is simpler than creating your own threads.

2. Examples


"""
@file   : 004- Use of thread pool .py
@author : xiaolu
@email  : luxiaonlp@163.com
@time   : 2021-02-01
"""
import concurrent.futures
import requests
from bs4 import BeautifulSoup
 
 
def craw(url):
    #  Crawl the content of a web page 
    r = requests.get(url)
    return r.text
 
 
def parse(html):
    #  Parse the contents 
    soup = BeautifulSoup(html, "html.parser")
    links = soup.find_all("a", class_="post-item-title")
    return [(link["href"], link.get_text()) for link in links]   #  Take out the link and title 
 
 
if __name__ == '__main__':
    #  Links to Web pages to be crawled 
    urls = [
        "https://www.cnblogs.com/sitehome/p/{}".format(page) for page in range(1, 50 + 1)
    ]
        
    # craw
    with concurrent.futures.ThreadPoolExecutor() as pool:
        htmls = pool.map(craw, urls)
        htmls = list(zip(urls, htmls))
        for url, html in htmls:
            print(url, len(html))
    print("craw over")
    
    # parse
    with concurrent.futures.ThreadPoolExecutor() as pool:
        futures = {}
        for url, html in htmls:
            future = pool.submit(parse, html)
            futures[future] = url
    
        # for future, url in futures.items():
        #     print(url, future.result())
    
        for future in concurrent.futures.as_completed(futures):
            url = futures[future]
            print(url, future.result())

Knowledge point supplement:

Use of thread pool

The base class for thread pools is Executor in the concurrent. futures module, and Executor provides two subclasses, ThreadPoolExecutor and ProcessPoolExecutor, where ThreadPoolExecutor is used to create thread pools and ProcessPoolExecutor is used to create process pools.

If you use thread pool/process pool to manage concurrent programming, you simply submit the corresponding task function to the thread pool/process pool, and the rest is done by the thread pool/process pool.

Exectuor provides the following common methods:

submit (fn, *args, **kwargs): Commits the fn function to the thread pool. * args represents the parameters passed to the fn function, and * kwargs represents the parameters passed to the fn function in the form of keyword parameters.

map (func, * iterables, timeout=None, chunksize=1): This function is similar to the global function map (func, * iterables), except that it starts multiple threads and immediately performs map processing on iterables asynchronously.

shutdown (wait=True): Close the thread pool.

After the program submits the task function (submit) to the thread pool, the submit method will return an Future object, and the Future class is mainly used to obtain the return value of the thread task function. Python is represented by Future because threaded tasks are executed asynchronously in the new thread, so the function executed by the thread is equivalent to a "completed in the future" task.


Related articles: