Tornado a Python Web server USES summaries

  • 2020-04-02 13:38:45
  • OfStack

The first thing I want to say is that it's safe, which really makes me feel its good intentions. This can be mainly divided into two points:

1. Prevent cross-site request forgery (CSRF or XSRF)

CSRF simply means that an attacker forges a real user to send a request.

For example, suppose a bank website has a URL like this:
http://bank.example.com/withdraw? Amount = 1000000 & for = Eve
When a user of the bank website visits the URL, Eve is given a million dollars. Users won't easily click on the URL, of course, but an attacker can embed a fake image on another site and set the image's address to this URL:
< Img SRC = "http://bank.example.com/withdraw? Amount = = 1000000 & for Eve" >
So when the user visits the malicious site, the browser makes a GET request for the URL, and a million is diverted without the user's knowledge.

It is easy to guard against the above attacks by not allowing change operations (such as money transfers) through GET requests. But other types of requests are not secure either, if an attacker constructs such a form:

<form action="http://bank.example.com/withdraw" method="post">
    <p> Forward lucky draw send  iPad  Ah! </p>
    <input type="hidden" name="amount" value="1000000">
    <input type="hidden" name="for" value="Eve">
    <input type="submit" value=" forwarding ">
</form>

The unknown user clicked the "forward" button and the money was transferred.

To prevent this, you need to add a field that an attacker cannot fake when it is not a GET request, and verify that the field has been modified when the request is processed.
Tornado's processing method is very simple. A randomly generated _xsrf field is added in the request, and this field is also added in the cookie. When receiving the request, the values of these two fields are compared.
Since cookies cannot be obtained or modified on non-website pages, this ensures that _xsrf cannot be forged by third-party websites (HTTP sniffing exception).
Of course, the user is free to obtain and modify cookies, but this is no longer the domain of CSRF: the user fakes what he or she does, of course, at his or her own expense.

To use this feature, when generating the tornado. Web.Application object, add the xsrf_cookies=True parameter, which generates a cookie field named _xsrf for the user.
In addition also need to you in the GET request form with xsrf_form_html (), if not Tornado template, in Tornado. Web. The RequestHandler internal can use self. Xsrf_form_html () to generate.

For AJAX requests, there is basically no need to worry about cross-site, so Tornado 1.1.1 prior versions did not validate requests With x-supplied-with: XMLHTTPRequest.
Later, Google engineers pointed out that malicious browser plug-ins could forge cross-domain AJAX requests, so they should be verified as well. I won't say yes or no, because browser plug-ins can have very large permissions, either to forge cookies or to submit forms directly.
The solution, however, is simply to get the _xsrf field from the cookie and add it to the AJAX request, or put it in the header of the x-xsrftoken or x-csrftoken request. If it's too much trouble, you can use jQuery's $.ajaxsetup () to handle:


$.ajaxSetup({
    beforeSend: function(jqXHR, settings) {
        type = settings.type
        if (type != 'GET' && type != 'HEAD' && type != 'OPTIONS') {
            var pattern = /(.+; *)?_xsrf *= *([^;" ]+)/;
            var xsrf = pattern.exec(document.cookie);
            if (xsrf) {
                jqXHR.setRequestHeader('X-Xsrftoken', xsrf[2]);
            }
        }
}});

Cross-site scripting (XSS), by the way. In contrast to CSRF, XSS takes advantage of the vulnerability of the site itself, injects script code into the site that the attacker wants to execute, and allows the user browsing the site to execute it.
However, as long as the user is not allowed to enter HTML at will (for example, yes < and > Escape), verify the attributes of HTML elements (such as escaped quotes in attributes, SRC and event handling attributes, etc.), and check the expression in CSS (including the style attribute).

2. Prevent the forging of cookies.

Both CSRF and XSS mentioned above are operated by attackers in the user's name without the user's knowledge. But the forgery cookie is the attacker's own initiative forges other user to carry on the operation.
For example, suppose that the login verification of a website is to check the user name in the cookie, and if it matches, the user is considered logged in. So an attacker can impersonate an administrator by setting a value like username=admin in a cookie.

To prevent cookies from being forged, you first need to mention two parameters when setting cookies: secure and httponly. These two parameters are not in tornado. Web. RequestHandler. Set_cookie () in the parameter list, but as keyword arguments, and in the Cookie. The Morsel. _reserved defined.
The former means that the cookie can only be passed over a secure connection (i.e., HTTPS), making it impossible for sniffers to intercept the cookie. The latter requires that it be accessed only under the HTTP protocol (i.e., the field in document.cookie cannot be obtained through JavaScript and will not be sent to the server via the HTTP protocol after setting), which makes it impossible for an attacker to simply forge cookies through JavaScript scripts.

However, for a malicious attacker, these two parameters do not prevent the cookie from being forged. To do this, you need to sign the cookie, and once it's modified, the server can tell.
Tornado provides set_secure_cookie() to sign cookies. When signing, you need to provide a series of secret keys (cookie_secret parameter when generating tornado. Web.Application object), which can be generated by the following code:
Base64. B64encode (uuid. Uuid4 () bytes + uuid. Uuid4 () bytes)
This parameter can be generated randomly, but if there are multiple Tornado processes running at the same time, or if it is sometimes restarted, it is better to share a constant and be careful not to leak.

This signature USES the HMAC algorithm, and the hash algorithm USES SHA1. Simply put, the hash of the cookie name, value, and timestamp is used as the signature, and the "value | timestamp | signature" is used as the new value. In this way, as long as the server takes the secret key and encrypts it again, it can determine whether the signature has changed or not.
It is worth mentioning that when reading the source code also found such a function:
Def _time_independent_equals (a, b) :
      If len (a)! Len = (b) :
              Return False
      Result = 0
      If the type (a [0]) is an int:   # python3 byte strings
              For x, y in zip(a, b):
                      Result | is equal to x to the y
      The else:   # python2
              For x, y in zip(a, b):
                      Result |= ord(x) ^ ord(y)
      Return the result = = 0
After reading for a while, I didn't see any advantage in comparing to normal strings until I saw the answer on StackOverflow: in order to avoid the attacker from testing the comparison time to determine the correct number of digits, this function kept the comparison time constant, thus eliminating this situation. (say this answer to see me all kinds of admire, make the expert of security really is not me so shallow...)

Third, followed by inheritance tornado. Web. RequestHandler.

On the execution flow, tornado. Web.Application looks for a matching RequestHandler class based on the URL and initializes it. Its dependinit__ () method calls the initialize() method, so just override the latter and there is no need to call the initialize() of the parent class.
It then looks for methods such as get/post() for this handler based on different HTTP methods, and runs prepare() before execution. None of these methods actively call the parent class, so call them when you need to.
Finally, the finish() method of the handler is called, which is best not overridden. It calls the on_finish() method, which can be overridden to handle the aftermath (such as closing the database connection), but it can no longer send data to the browser (because the HTTP response was sent, the connection may have been closed).

By the way, how to handle error pages.
Simply put, when RequestHandler's _execute() method (which internally executes methods like prepare(), get(), and finish() in turn), any uncaught errors are caught by its write_error() method, so override this method:

class RequestHandler(tornado.web.RequestHandler):
    def write_error(self, status_code, **kwargs):
        if status_code == 404:
            self.render('404.html')
        elif status_code == 500:
            self.render('500.html')
        else:
            super(RequestHandler, self).write_error(status_code, **kwargs)

You can also override the get_error_html() method for historical reasons, but it is not recommended.
In addition, you may fail to reach the _execute() method.
For example, the initialize() method throws an uncaught exception that is caught by IOStream and then closes the connection without any error pages being output to the user.
If no handler is found to handle the request, tornado. Web.ErrorHandler is used to handle the 404 error. In this case, you can replace this class to implement a custom error page:
class PageNotFoundHandler(RequestHandler):
    def get(self):
        raise tornado.web.HTTPError(404)
tornado.web.ErrorHandler = PageNotFoundHandler

Another method is to add a handler at the end of the handlers parameter of the Application that captures any URL:
application = tornado.web.Application([
    # ...
    ('.*', PageNotFoundHandler)
])


Four, followed by processing login.

Tornado provides @ Tornado. Web. Authenticated the decorator, in front of the handler of the get () method combined with can.
It relies on three codes:
The get_current_user() method of the handler needs to be defined, for example:

def get_current_user(self):
    return self.get_secure_cookie('user_id', 0)

When its return value is false, it jumps to the login page.
Set the login_url parameter when creating the application:
application = tornado.web.Application(
    [
        # ...
    ],
    login_url = '/login'
)

Define the get_login_url() method of the handler.
If the default login_url parameter cannot be used (for example, regular users and administrators need different login addresses), then the get_login_url() method can be overwritten:
class AdminHandler(RequestHandler):
    def get_login_url(self):
        return '/admin/login'

By the way, jumping to the login page comes with a next parameter that points to the url visited before login. To achieve a better user experience, you need to go to the url after login:
class LoginHandler(RequestHandler):
    def get(self):
        if self.get_current_user():
            self.redirect('/')
            return
        self.render('login.html')
    def post(self):
        if self.get_current_user():
            raise tornado.web.HTTPError(403)
        # check username and password
        if success:
            self.redirect(self.get_argument('next', '/'))

Also, I use AJAX in a lot of places, and the front end is too lazy to handle 403 errors, so I can only modify authenticated() :
def authenticated(method):
    """Decorate methods with this to require that the user be logged in."""
    @functools.wraps(method)
    def wrapper(self, *args, **kwargs):
        if not self.current_user:
            if self.request.headers.get('X-Requested-With') == 'XMLHttpRequest': # jQuery  The library will come with this header 
                self.set_header('Content-Type', 'application/json; charset=UTF-8')
                self.write(json.dumps({'success': False, 'msg': u' Your session has expired, please login again! '}))
                return
            if self.request.method in ("GET", "HEAD"):
                url = self.get_login_url()
                if "?" not in url:
                    if urlparse.urlsplit(url).scheme:
                        # if login url is absolute, make next absolute too
                        next_url = self.request.full_url()
                    else:
                        next_url = self.request.uri
                    url += "?" + urllib.urlencode(dict(next=next_url))
                self.redirect(url)
                return
            raise tornado.web.HTTPError(403)
        return method(self, *args, **kwargs)
    return wrapper

Five, and then said to get the user's IP address.

In short, you can get it in the handler method using self.request-remote_ip.
However, if you use the reverse proxy, you will get the IP of the proxy, so you need to increase the setting of xheaders when creating HTTPServer:

if __name__ == '__main__':
    from tornado.httpserver import HTTPServer
    from tornado.netutil import bind_sockets
    sockets = bind_sockets(80)
    server = HTTPServer(application, xheaders=True)
    server.add_sockets(sockets)
    tornado.ioloop.IOLoop.instance().start()

In addition, I only need to deal with IPv4, but the local test will get ::1 IPv6 address, so I need to set up:
if settings.IPV4_ONLY:
    import socket
    sockets = bind_sockets(80, family=socket.AF_INET)
else:
    sockets = bind_sockets(80)

Finally, how to improve the performance in the production environment.

Tornado can create multiple child processes before HTTPServer calls add_sockets(), taking advantage of multiple cpus to handle concurrent requests.

In short, the code is as follows:

if __name__ == '__main__':
    if settings.IPV4_ONLY:
        import socket
        sockets = bind_sockets(80, family=socket.AF_INET)
    else:
        sockets = bind_sockets(80)
    if not settings.DEBUG_MODE:
        import tornado.process
        tornado.process.fork_processes(0) # 0  According to the said  CPU  Number creates a corresponding number of child processes 
    server = HTTPServer(application, xheaders=True)
    server.add_sockets(sockets)
    tornado.ioloop.IOLoop.instance().start()

Note that autoreload cannot be enabled in this manner (the debug parameter cannot be true when the application is created).


Related articles: