Tornado Implementation of Multi process and Multi thread HTTP Service Detailed Explanation

  • 2021-07-26 08:18:37
  • OfStack

Basic flow of tornado web service

1. Implement Handler for processing requests, which inherits from tornado. web. RequestHandler, and implements corresponding methods for processing requests, such as get, post, etc. The returned content is output with the self. write method.

2. Instantiate an Application. The argument to the constructor is an Handlers list, which corresponds the request to Handler through a regular expression. Other objects required by Handler are passed as parameters to the initialize method of Handler through dict.

3. Initialize an tornado. httpserver. HTTPServer object, with the constructor argument being the Application object from the previous step.

4. Bind a port to the HTTPServer object.

5. Start IOLoop.

Features needed

Since the highlight of tornado is asynchronous requests, the first thing that comes to mind here is to transform all requests to asynchronous. However, there is a problem here, that is, there must be no blocking calls in asynchronous functions, otherwise the whole IOLoop will be stuck. This requires a radical transformation of the service, transforming all IO or longer requests into asynchronous functions. This amount of work is very large, and it is necessary to modify the existing code. Therefore, we consider using thread pool to implement it. When one thread blocks on a request or IO, other threads or IOLoop continue to execute.

Another bottleneck is that GIL limits the concurrent number of CPU, so consider increasing the number of processes by means of sub-processes to improve the upper limit of service capacity.

Based on the above analysis, the following schemes are roughly used:

Copy multiple processes as child processes so that read-only pages in child processes point to the same physical page. Thread pool. Avoid the workload of asynchronous transformation and increase the concurrency of IO.

Test code

First, test the thread pool. The test cases are:

Two requests are made to the sleep page at the same time:

Functions running in the thread pool (in this case self.block_task) can be executed simultaneously. This is represented by alternately printing numbers on the console. Two get requests are returned almost simultaneously, and the returned content is displayed on the browser.

The test code for the thread pool is as follows:


import os
import sys
import time
 
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
import tornado.gen
from tornado.concurrent import run_on_executor
from concurrent.futures import ThreadPoolExecutor
from tornado.options import define, options
 
class HasBlockTaskHandler(tornado.web.RequestHandler):
  executor = ThreadPoolExecutor(20)  # Start the thread pool, which is determined by the current RequestHandler Hold 
   
  @tornado.gen.coroutine
  def get(self):
    strTime = time.strftime("%Y-%m-%d %H:%M:%S")
    print "in get before block_task %s" % strTime
    result = yield self.block_task(strTime)
    print "in get after block_task"
    self.write("%s" % (result))
 
  @run_on_executor
  def block_task(self, strTime):
    print "in block_task %s" % strTime
    for i in range(1, 16):
      time.sleep(1)
      print "step %d : %s" % (i, strTime)
    return "Finish %s" % strTime
 
if __name__ == "__main__":
  tornado.options.parse_command_line()
  app = tornado.web.Application(handlers=[(r"/sleep", HasBlockTaskHandler)], autoreload=False, debug=False)
  http_server = tornado.httpserver.HTTPServer(app)
  http_server.bind(8888)
  tornado.ioloop.IOLoop.instance().start()

There are several places to pay attention to in the whole code:

1. executor = ThreadPoolExecutor (20). This initializes a thread pool for the Handler class. Among them, concurrent. futures does not belong to tornado, but is an independent module of python, which is a built-in module in python3, and python2.7 needs to be installed by itself.

2. Modifier @ run_on_executor. This modifier transforms the synchronous function into an asynchronous function that runs on executor (in this case, a thread pool). The internal implementation is to change the modified function submit to executor and return an Future object.

3. Modifier @ tornado.gen.coroutine. The function modified by this modifier is an asynchronous function written as a synchronous function. Asynchronous code originally written by callback can be written by yield1 Future with this modifier. The modified function will be suspended after yield has one Future object, and will continue after the result of Future object is returned.

After running the code, you visit the sleep page on two different browsers and get the desired effect. There is an episode here, that is, if you test on two tab in the same browser, you can't see the desired effect. The second get request will be block until the first get request returns, and the server will not start processing the second get request. This makes me feel that multithreading is not effective. It took me half a day to check a lot of information, only to see that the browser put the same second request block. Refer to here for specific links.

Because tornado conveniently supports the multi-process model, the use of multi-processes is much simpler. In the above example, only minor changes are needed to the startup part. The specific code is as follows:


if __name__ == "__main__":
  tornado.options.parse_command_line()
  app = tornado.web.Application(handlers=[(r"/sleep", HasBlockTaskHandler)], autoreload=False, debug=False)
  http_server = tornado.httpserver.HTTPServer(app)
  http_server.bind(8888)
  print tornado.ioloop.IOLoop.initialized()
  http_server.start(5)
  tornado.ioloop.IOLoop.instance().start()

There are two points to note:

1. app = tornado. web. Application (handlers = [(r "/sleep", HasBlockTaskHandler)], autoreload = False, debug = False). When generating Application objects, two parameters of autoreload and debug should be set to False. That is, it is necessary to ensure that IOLoop is not initialized before fork child process. This can be followed by the tornado. ioloop. IOLoop. initialized () function.

2. http_server. start (5) Set the number of processes through the start function before starting IOLoop. If it is set to 0, it means that each CPU starts 1 process.
The result is that you can see n+1 processes running and sharing the same port.


Related articles: