python multithreading and global variables in detail

  • 2021-01-25 07:44:46
  • OfStack

This morning I got up to write the crawler. The basic framework has been set up. When adding the multi-thread crawler function, I found an error:

For example, add 200 url to the url list of downloaded files, and open 50 threads. My creeper... I crawled 50 urls and named all of them as 0.html, which means that the final download result is 1 0.html (duplicate overwrite) and 1-150. Here is my code:


x = str(theguardian_globle.g)
 #x Name for the name of the downloaded file 
 filePath = "E://wgetWeiBao//"+x+".html"
 try:
  wget.download(url,filePath)
  theguardian_globle.g+=1
  print x+" is downloading..."
 
 except:
  print "error!"

# This is the global variable g The definition of 
global g
 
g = 0

The problem is that multithreading and global variables are a dangerous combination because the program has multiple threads executing at the same time. Multiple threads operating on global variables at the same time can cause confusion. When you operate on global variables in multiple threads, you should lock the operation.

The following is the modified code:


 Function: 
 
def downLoad(url,num):
 x = str(num)
 filePath = "E://wgetWeiBao//"+x+".html"
 try:
  wget.download(url,filePath)
  print x+" is downloading..."
 
 except:
  print "error!"

 Multithreaded consumer _ Locks statements that operate on global variables 
class Cosumer(threading.Thread):
 def run(self):
  print('%s:started' % threading.current_thread())
 
  while True:
   global gCondition
   gCondition.acquire()
   while q.empty()==True:
    gCondition.wait()
   url = q.get()
   num = theguardian_globle.g
   theguardian_globle.g+=1
   gCondition.release()
   downLoad(url,num)

And you're done!


Related articles: