A thorough understanding of multithreading in Python is a must for beginners

2020-05-17 05:50:42
OfStack

Example 1
We will request 5 different url:
Single thread


import time
import urllib2
 
defget_responses():
  urls=[
     ' http://www.baidu.com',
     ' http://www.amazon.com',
     ' http://www.ebay.com',
     ' http://www.alibaba.com',
     ' //www.ofstack.com'
  ]
  start=time.time()
  forurlinurls:
    printurl
    resp=urllib2.urlopen(url)
    printresp.getcode()
  print " Elapsed time: %s " %(time.time()-start)
 
get_responses()

The output is:
http://www.baidu.com200
http://www.amazon.com200
http://www.ebay.com200
http://www.alibaba.com200
//www.ofstack.com200
Elapsed time:3.0814409256

Explanation:
url sequence is requested
Unless cpu gets a response from one url, it will not request another url
Network requests take a long time, so cpu 1 stays idle while waiting for the network request to return.
multithreading


import urllib2
import time
from threading import Thread
 
classGetUrlThread(Thread):
  def__init__(self, url):
    self.url=url
    super(GetUrlThread,self).__init__()
 
  defrun(self):
    resp=urllib2.urlopen(self.url)
    printself.url, resp.getcode()
 
defget_responses():
  urls=[
     ' http://www.baidu.com',
     ' http://www.amazon.com',
     ' http://www.ebay.com',
     ' http://www.alibaba.com',
     ' //www.ofstack.com'
  ]
  start=time.time()
  threads=[]
  forurlinurls:
    t=GetUrlThread(url)
    threads.append(t)
    t.start()
  fortinthreads:
    t.join()
  print " Elapsed time: %s " %(time.time()-start)
 
get_responses()

Output:
//www.ofstack.com200
http://www.baidu.com200
http://www.amazon.com200
http://www.alibaba.com200
http://www.ebay.com200
Elapsed time:0.689890861511

Explanation:

Aware of the program's improvement in execution time
We wrote a multithreaded program to reduce the waiting time of cpu. When we are waiting for the network request within one thread to return, cpu can switch to another thread to make the network request within another thread.
We expect one thread to handle one url, so when we instantiate the thread class we pass one url.
Thread running means executing the run() method in the class.
Anyway we think each thread must execute run().
Creating one thread for each url and calling the start() method tells cpu that it can execute the run() method in the thread.
We want all the threads to finish executing and calculate how long it will take, so we call the join() method.
join() can tell the main thread to wait until the thread ends before executing the next instruction.
We call the join() method for each thread, so we calculate the run time after all the threads have finished executing.

About threads:

cpu may not execute the run() method immediately after calling start().
You cannot determine the order in which run() is executed between threads.
For a single thread, the statements in the run() method are guaranteed to be executed in order.
This is because the url in the thread is first requested and then printed out the result returned.

Example 2

We will use a program to demonstrate resource contention between multiple threads and fix the problem.


from threading import Thread
 
#define a global variable
some_var=0
 
classIncrementThread(Thread):
  defrun(self):
    #we want to read a global variable
    #and then increment it
    globalsome_var
    read_value=some_var
    print " some_var in %s is %d " %(self.name, read_value)
    some_var=read_value+1
    print " some_var in %s after increment is %d " %(self.name, some_var)
 
defuse_increment_thread():
  threads=[]
  foriinrange(50):
    t=IncrementThread()
    threads.append(t)
    t.start()
  fortinthreads:
    t.join()
  print " After 50 modifications, some_var should have become 50 " 
  print " After 50 modifications, some_var is %d " %(some_var,)
 
use_increment_thread()

Run the program multiple times, and you'll see many different results.
Explanation:
There is one global variable that all threads want to modify.
All threads should add 1 to this global variable.
There are 50 threads, and at the end this number should be 50, but it's not there.
Why didn't it reach 50?
When some_var is 15, thread t1 reads some_var, at which point cpu gives control to another thread t2.
The some_var read by the t2 thread is also 15
Both t1 and t2 add some_var to 16
At the time, we expected t1, t2, two threads to make some_var + 2 become 17
Here you have competition for resources.
The same can happen between other threads, so the final result is less than 50.
Addressing resource competition


from threading import Lock, Thread
lock=Lock()
some_var=0
 
classIncrementThread(Thread):
  defrun(self):
    #we want to read a global variable
    #and then increment it
    globalsome_var
    lock.acquire()
    read_value=some_var
    print " some_var in %s is %d " %(self.name, read_value)
    some_var=read_value+1
    print " some_var in %s after increment is %d " %(self.name, some_var)
    lock.release()
 
defuse_increment_thread():
  threads=[]
  foriinrange(50):
    t=IncrementThread()
    threads.append(t)
    t.start()
  fortinthreads:
    t.join()
  print " After 50 modifications, some_var should have become 50 " 
  print " After 50 modifications, some_var is %d " %(some_var,)
 
use_increment_thread()

Run the program again and get the results we expected.
Explanation:
Lock is used to prevent competitive conditions
If thread t1 acquires the lock before performing something else. Other threads do not perform the same operation until t1 releases Lock
We want to make sure that 1 thread t1 has read some_var, and that no other thread can read some_var until t1 has modified some_var
This makes reading and modifying some_var a logical atomic operation.
Example 3
Let's use an example to show that a thread cannot affect variables (non-global variables) in other threads.
time.sleep () can suspend a thread, forcing a thread switch to occur.


from threading import Thread
import time
 
classCreateListThread(Thread):
  defrun(self):
    self.entries=[]
    foriinrange(10):
      time.sleep(1)
      self.entries.append(i)
    printself.entries
 
defuse_create_list_thread():
  foriinrange(3):
    t=CreateListThread()
    t.start()
 
use_create_list_thread()

After running it a few times, it didn't print out the result. While one thread was printing, cpu switched to another thread, resulting in incorrect results. We need to make sure that print self.entries is a logical atomic operation in case the printing is interrupted by another thread.
We used Lock(). Let's look at the following example.


from threading import Thread, Lock
import time
 
lock=Lock()
 
classCreateListThread(Thread):
  defrun(self):
    self.entries=[]
    foriinrange(10):
      time.sleep(1)
      self.entries.append(i)
    lock.acquire()
    printself.entries
    lock.release()
 
defuse_create_list_thread():
  foriinrange(3):
    t=CreateListThread()
    t.start()
 
use_create_list_thread()

This time we saw the right results. Proves that a thread cannot modify variables inside other threads (non-global variables).