python Multi process and Multi thread Use Detailed Explanation

  • 2021-10-16 01:59:39
  • OfStack

Directory processes and threads
Multi-process of Python
Process pool
Data Communication and Sharing among Multiple Processes
Multithreading in Python
Data Sharing among Multithreads
Communication Using queue Queue-Classic Producer and Consumer Model

Processes and Threads

Process is the smallest unit of resource allocation, and thread is the smallest unit of scheduling execution;

An application program contains at least one process, and a process contains at least one thread;

Each process has an independent memory space in the execution process, and the threads in a process share the memory space of the process;

The core of the computer is CPU, which undertakes all the computing tasks. It is like a factory, running all the time. Assuming that the power of the factory is limited, it can only be supplied to one workshop at a time. That is to say, when one workshop starts, other workshops must stop working. The implication behind this is that a single CPU1 can only run one task at a time. Editor's Note: The multi-core CPU is like having multiple power plants, making it possible to implement multiple plants (multiple processes). A process is like a workshop of a factory, which represents a single task that CPU can handle. At any one time, CPU always runs one process, and other processes are not running. There can be many workers in a workshop. They work together to complete a task. Threads are like workers in a workshop. A process can include multiple threads. Workshop space is shared by workers, for example, many rooms are accessible to every worker. This means that the memory space of one process is shared, and each thread can use this shared memory. However, the size of each room is different, and some rooms can only accommodate one person at most, such as toilets. When there is someone inside, no one else can go in. This means that when one thread uses some shared memory, other threads must wait for it to end before using the 1 block of memory. A simple way to prevent others from entering is to add a lock to the door. Those who arrive first lock the door, and those who arrive later see the lock, so they line up at the door and wait for the lock to open before entering. This is called "mutual exclusion lock" (Mutual exclusion, abbreviation Mutex), which prevents multiple threads from reading and writing a certain memory area at the same time. There are also some rooms that can accommodate n people at the same time, such as the kitchen. That is to say, if the number is larger than n, the extra people can only wait outside. This is like some memory areas, which can only be used by a fixed number of threads. At this time, the solution is to hang n keys at the door. Those who go in take a key and hang it back when they come out. When those who arrived later found that the key was overhead, they knew they had to wait in line at the door. This practice is called a "semaphore" (Semaphore) and is used to ensure that multiple threads do not collide with each other. It is not difficult to see that mutex is a special case of semaphore (n=1). That is to say, the latter can completely replace the former. However, because mutex is relatively simple and efficient, this design is still adopted when the exclusive resources must be guaranteed.

Multi-process of Python

The multi-process of Python depends on the multiprocess module; Using multi-process can use multiple CPU for parallel computing;

Example:


from multiprocessing import Process
import os
import time
 
def long_time_task(i):
    print(' Child process : {} -  Mission {}'.format(os.getpid(), i))
    time.sleep(2)
    print(" Results : {}".format(8 ** 20))
 
if __name__=='__main__':
    print(' Current parent process : {}'.format(os.getpid()))
    start = time.time()
    p1 = Process(target=long_time_task, args=(1,))
    p2 = Process(target=long_time_task, args=(2,))
    print(' Wait for all child processes to complete. ')
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    end = time.time()
    print(" Total time spent {} Seconds ".format((end - start)))

New processes and switching between processes need to consume resources, so the number of processes should be controlled;

The number of processes that can run at the same time is limited by CPU cores;

Process pool

Create a process using the process pool pool:

Using process pool can avoid the trouble of creating processes manually, and the default number is CPU cores;

Pool class can provide a specified number of processes for users to use. When a new request is submitted to Pool, if the process pool is not full, a new process will be created to execute the request; If the pool is full, the request will wait until there are idle processes available before executing the request;

Several methods:

1.apply_async

The function is to submit the functions and parameters that need to be executed to the process pool. Each process is called in a non-blocking asynchronous way, and each process only runs by itself, which is the default way;

2.map

Blocks the process until the result is returned;

3.map_sunc

Non-blocking process;

4.close

Close the process pool and no longer accept tasks;

5.terminate

End the process;

6.join

The main process blocks until the execution of the sub-process ends;

Example:


from multiprocessing import Pool, cpu_count
import os
import time
 
def long_time_task(i):
    print(' Child process : {} -  Mission {}'.format(os.getpid(), i))
    time.sleep(2)
    print(" Results : {}".format(8 ** 20))
 
if __name__=='__main__':
    print("CPU Kernel number :{}".format(cpu_count()))
    print(' Current parent process : {}'.format(os.getpid()))
    start = time.time()
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print(' Wait for all child processes to complete. ')
    p.close()
    p.join()
    end = time.time()
    print(" Total time spent {} Seconds ".format((end - start)))

Before join, close or terminate must be used so that the process pool no longer accepts tasks;

Data Communication and Sharing among Multiple Processes

Usually, processes are independent of each other, and each process has its own memory. Through shared memory (nmap module), objects can be shared between processes, so that multiple processes can access the same variable (same address, variable name may be different). Sharing resources among multiple processes will inevitably lead to competition among processes, so the use of shared state should be prevented as much as possible. Another way is to use queue queue to realize communication or data sharing between different processes, which is similar to multithreaded programming.

In the following example, two independent processes are created in this code, one is responsible for writing (pw) and one is responsible for reading (pr), thus sharing one queue queue.


from multiprocessing import Process, Queue
import os, time, random
 
#  Code executed by the write data process :
def write(q):
    print('Process to write: {}'.format(os.getpid()))
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())
 
#  Code executed by the read data process :
def read(q):
    print('Process to read:{}'.format(os.getpid()))
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)
 
if __name__=='__main__':
    #  Parent process creation Queue And pass it to each child process: 
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    #  Promoter process pw , write :
    pw.start()
    #  Promoter process pr , read :
    pr.start()
    #  Wait pw End :
    pw.join()
    # pr The process is an infinite loop, so you can't wait for it to end, you can only terminate it forcibly :
    pr.terminate()

Multithreading in Python

Multi-process programming in python 3 relies mainly on the threading module. Creating a new thread is very similar to creating a new process. The threading. Thread method can take two arguments, the first is target, which generally points to the function name, and the second is args, which needs to be passed to the function. For the new thread you create, call the start () method to get it started. We can also print out the name of the current thread using current_thread (). name.


import threading
import time
 
def long_time_task(i):
    print(' Current child thread : {}  Mission {}'.format(threading.current_thread().name, i))
    time.sleep(2)
    print(" Results : {}".format(8 ** 20))
 
if __name__=='__main__':
    start = time.time()
    print(' This is the main thread: {}'.format(threading.current_thread().name))
    thread_list = []
    for i in range(1, 3):
        t = threading.Thread(target=long_time_task, args=(i, ))
        thread_list.append(t)
    for t in thread_list:
        t.start()
    for t in thread_list:
        t.join()
    end = time.time()
    print(" Total time spent {} Seconds ".format((end - start)))

Data Sharing among Multithreads

Memory is shared between different threads in a process, which means that any variable can be modified by any thread. Therefore, the biggest danger of sharing data between threads is that multiple threads change one variable at the same time, which confuses the contents. If there is a shared variable between different threads, one way is to put a lock lock on it before modifying it, ensuring that only one thread can modify it at a time. The threading. lock () method can easily lock a shared variable, and the modified release can be used by other threads.


import threading
 
class Account:
    def __init__(self):
        self.balance = 0
 
    def add(self, lock):
        #  Acquisition of a lock 
        lock.acquire()
        for i in range(0, 100000):
            self.balance += 1
        #  Release lock 
        lock.release()
 
    def delete(self, lock):
        #  Acquisition of a lock 
        lock.acquire()
        for i in range(0, 100000):
            self.balance -= 1
            #  Release lock 
        lock.release()
 
if __name__ == "__main__":
    account = Account()
    lock = threading.Lock()
    #  Create Thread 
   thread_add = threading.Thread(target=account.add, args=(lock,), name='Add')
    thread_delete = threading.Thread(target=account.delete, args=(lock,), name='Delete')
 
    #  Startup thread 
   thread_add.start()
    thread_delete.start()
 
    #  Wait for the thread to end 
   thread_add.join()
    thread_delete.join()
 
    print('The final balance is: {}'.format(account.balance))

Communication with queue Queue-Classic Producer and Consumer Model


from queue import Queue
import random, threading, time
 
#  Producer category 
class Producer(threading.Thread):
    def __init__(self, name, queue):
        threading.Thread.__init__(self, name=name)
        self.queue = queue
 
    def run(self):
        for i in range(1, 5):
            print("{} is producing {} to the queue!".format(self.getName(), i))
            self.queue.put(i)
            time.sleep(random.randrange(10) / 5)
        print("%s finished!" % self.getName())
 
#  Consumer category 
class Consumer(threading.Thread):
    def __init__(self, name, queue):
        threading.Thread.__init__(self, name=name)
        self.queue = queue
 
    def run(self):
        for i in range(1, 5):
            val = self.queue.get()
            print("{} is consuming {} in the queue.".format(self.getName(), val))
            time.sleep(random.randrange(10))
        print("%s finished!" % self.getName())
 
def main():
    queue = Queue()
    producer = Producer('Producer', queue)
    consumer = Consumer('Consumer', queue)
 
    producer.start()
    consumer.start()
 
    producer.join()
    consumer.join()
    print('All threads finished!')
 
if __name__ == '__main__':
    main()
For CPU-intensive code (such as loop computing)-multi-process efficiency is higher For IO-intensive code (such as file manipulation, web crawler)-multithreading is more efficient.

For IO intensive operations, most of the time consumed is actually waiting time. In the waiting time, CPU does not need to work, so you can't make use of double CPU resources during this period. On the contrary, for CPU intensive codes, two CPU work much faster than one CPU. So why is multithreading useful for IO-intensive code? At this time, because python meets the wait, it will release GIL for new threads to use, thus realizing the switching between threads.

The above is the python multi-process and multi-thread use details, more about python multi-process and multi-thread information please pay attention to other related articles on this site!


Related articles: