Tutorial for performing asynchronous Solr queries under Python's gevent framework

2020-05-09 18:50:47
OfStack

I often need to make asynchronous requests using Python and solr. Here is a piece of code that blocks the Solr http request and does not execute the second request until the first is completed. The code is as follows:


import requests
 
#Search 1
solrResp = requests.get('http://mysolr.com/solr/statedecoded/search?q=law')
 
for doc in solrResp.json()['response']['docs']:
  print doc['catch_line']
 
#Search 2
solrResp = requests.get('http://mysolr.com/solr/statedecoded/search?q=shoplifting')
 
for doc in solrResp.json()['response']['docs']:
  print doc['catch_line']

(we use the Requests library for http requests)

It's good to script documents to Solr so you can work in parallel. I need to extend my work, so the index bottleneck is Solr, not network requests.

Unfortunately, python is not as convenient when doing asynchronous programming as Javascript or Go. However, the gevent library can help. The bottom layer of gevent USES the libevent library, which is built on native asynchronous calls (select, poll, etc.). libevent is a good coordinator of many low-level asynchronous functions.

It's easy to use gevent, but one of the most frustrating things about thegevent.monkey.patch_all () is that it fixes a number of standard libraries for better asynchronous collaboration with gevent. It sounds scary, but I haven't had any problems with this patch implementation.

Without further delay, here's how you can parallel Solr requests using gevents:


import requests
from gevent import monkey
import gevent
monkey.patch_all()
 
 
class Searcher(object):
  """ Simple wrapper for doing a search and collecting the
    results """
  def __init__(self, searchUrl):
    self.searchUrl = searchUrl
 
  def search(self):
    solrResp = requests.get(self.searchUrl)
    self.docs = solrResp.json()['response']['docs']
 
 
def searchMultiple(urls):
  """ Use gevent to execute the passed in urls;
    dump the results"""
  searchers = [Searcher(url) for url in urls]
 
  # Gather a handle for each task
  handles = []
  for searcher in searchers:
    handles.append(gevent.spawn(searcher.search))
 
  # Block until all work is done
  gevent.joinall(handles)
 
  # Dump the results
  for searcher in searchers:
    print "Search Results for %s" % searcher.searchUrl
    for doc in searcher.docs:
      print doc['catch_line']
 
searchUrls = ['http://mysolr.com/solr/statedecoded/search?q=law',
       'http://mysolr.com/solr/statedecoded/search?q=shoplifting']

searchMultiple(searchUrls)
The code has been added, and it's not as clean as the Javascript code that does the same thing, but it does the same thing. The essence of the code is the following lines:


# Gather a handle for each task
handles = []
for searcher in searchers:
  handles.append(gevent.spawn(searcher.search))
 
# Block until all work is done
gevent.joinall(handles)

We let gevent generate searcher.search, we can operate on the generated tasks, and then we can wait for all the generated tasks to complete at will, and finally export the results.

That's about it. If you have any ideas, please leave us a comment. Let us know how we can help you with your Solr search application.