Detailed Explanation of Python Multi process and Multi thread Use Scenarios

  • 2021-09-20 20:47:47
  • OfStack

Preface

Python Multiprocess Scenarios: Computational Intensive (CPU Intensive) Tasks

Scenarios for Python Multithreading: IO Intensive Tasks

Computational intensive tasks 1 generally refer to the need to do a lot of logical operations, such as hundreds of millions of addition, subtraction, multiplication and division. Using multi-core CPU can improve computing performance concurrently.

IO intensive task 1 generally refers to input and output type, such as file reading or network request. This kind of scenario 1 will encounter IO blocking, and using multi-core CPU to execute will not improve the performance too much.

Let's use a 64-core virtual machine to perform tasks, and distinguish them with sample code.

Example 1: Perform a computationally intensive task and perform 100 million operations

Use multiple processes


from multiprocessing import Process
import os, time
 
 
#  Computational intensive tasks 
def work():
 res = 0
 for i in range(100 * 100 * 100 * 100): #  Billion operations 
  res *= i
 
 
if __name__ == "__main__":
 l = []
 print(" Native is ", os.cpu_count(), " Nucleus  CPU") #  Native is 64 Nucleus 
 start = time.time()
 for i in range(4):
  p = Process(target=work) #  Multi-process 
  l.append(p)
  p.start()
 for p in l:
  p.join()
 stop = time.time()
 print(" Computational intensive tasks, multi-process time-consuming  %s" % (stop - start))

Use multithreading


from threading import Thread
import os, time
 
 
#  Computational intensive tasks 
def work():
 res = 0
 for i in range(100 * 100 * 100 * 100): #  Billion operations 
  res *= i
 
 
if __name__ == "__main__":
 l = []
 print(" Native is ", os.cpu_count(), " Nucleus  CPU") #  Native is 64 Nucleus 
 start = time.time()
 for i in range(4):
  p = Thread(target=work) #  Multithreading 
  l.append(p)
  p.start()
 for p in l:
  p.join()
 stop = time.time()
 print(" Computational intensive tasks, multi-threading time consuming  %s" % (stop - start))

Two code outputs:

This machine is 64-core CPU
Computational intensive task, multi-process time consuming 6.864224672317505

This machine is 64-core CPU
Computational intensive task, multithreading takes 37.91042113304138

Note: In the above code, four multi-processes and four multi-threads are used to perform 100 million operations, which takes 6.86 s for multi-processes and 37.91 s for multi-threads. It can be seen that in the scenario of computation-intensive tasks, the use of multi-processes can greatly improve efficiency.

In addition, when 8 multi-processes and 8 multi-threads are used to perform 100 million operations respectively, the time-consuming gap is even larger, and the output is as follows:

This machine is 64-core CPU
Computational intensive task, multi-process time consuming 6.811635971069336

This machine is 64-core CPU
Computational intensive task, multithreading takes 113.53767895698547

It can be seen that under the 64-core cpu machine, the efficiency of using 8 multi-processes and 4 multi-processes at the same time is almost one. Using multithreading is slower. To make the most efficient use of CPU, the number of computationally intensive tasks simultaneously should be equal to the number of cores of CPU

Example 2: 400 times, blocking for two seconds, reading a file

Use multiple processes (4-core cpu)


from multiprocessing import Process
import os, time
 
 
# I/0 Intensive task 
def work():
 time.sleep(5) #  Block for two seconds 
 
 
if __name__ == "__main__":
 l = []
 print(" Native is ", os.cpu_count(), " Nucleus  CPU")
 start = time.time()
 for i in range(1000):
  p = Process(target=work) #  Multi-process 
  l.append(p)
  p.start()
 for p in l:
  p.join()
 stop = time.time()
 print("I/0 Intensive task, multi-process time-consuming  %s" % (stop - start))

Use multithreading (4-core cpu)


from threading import Thread
import os, time
 
 
# I/0 Intensive task 
def work():
 time.sleep(5) #  Block for two seconds 
 
 
if __name__ == "__main__":
 l = []
 print(" Native is ", os.cpu_count(), " Nucleus  CPU")
 start = time.time()
 
 for i in range(1000):
  p = Thread(target=work) #  Multithreading 
  l.append(p)
  p.start()
 for p in l:
  p.join()
 stop = time.time()
 print("I/0 Intensive task, multi-threading time consuming  %s" % (stop - start))

Output:

This machine is 64-core CPU
I/0 intensive task, multi-process time 12.28218412399292


This machine is 64-core CPU
I/0 intensive task, multithreading time 5.399136066436768

python multithreading in the existence of GIL lock, no matter how many cores of cpu machine, can only use single core, from the output results, for IO intensive tasks using multithreading is more dominant.

FAQ: 1 error reported while performing a multi-process io intensive task:

OSError: [Errno 24] Too many open files

Cause: linux system limitations


ulimit -n
#  Output  1024

Solution: (Temporary increase of system limit, failure after restart)


ulimit -n 10240

Summarize


Related articles: