Detail the acceptance process for the request body in the core configuration module of Nginx

  • 2020-05-10 23:28:28
  • OfStack

This article focuses on the request receiving process in nginx, including the parsing of the request header and the reading process of the request body.


First of all, the basic http request format defined in rfc2616 is introduced:


Request = Request-Line  
   *(( general-header      
      | request-header      
      | entity-header ) CRLF)  
     CRLF 
     [ message-body ]  </span> 

 

The first line is the request line (request line), which describes the request method, the resource to access, and the version of HTTP used:
Request-Line     = Method SP Request-URI SP HTTP-Version CRLF < /span >  

The request method (Method) is defined as follows, among which the most commonly used are GET and POST methods:


Method = "OPTIONS"  
| "GET"  
| "HEAD"  
| "POST"  
| "PUT"  
| "DELETE"  
| "TRACE"  
| "CONNECT"  
| extension-method  
extension-method = token

The resource to be accessed is determined by the unified 1 resource status character URI(Uniform Resource Identifier). One of its more common constituent formats (rfc2396) is as follows:


<scheme>://<authority><path>?<query> 

1 generally, depending on the request method (Method), the request URI format will be different, and you usually just need to write the path and query sections.

The http version (version) is defined as follows, and the ones in use today are generally versions 1.0 and 1.1:


HTTP/<major>.<minor> 


The next line in the request line is the request header. rfc2616 defines three different types of request headers, general-header, request-header, and entity-header, respectively. Each type of rfc defines some common headers, among which entity-header can contain custom headers.


Now introduce nginx request header parsing, nginx request handling process, involves two important data structure, ngx_connection_t and ngx_http_request_t, respectively to indicate connection and request, the two data structures in the first article of the book has made a more detailed introduction, no impression on the reader can turn back to review 1, from the beginning to the end, the entire request processing correspond to the two data structure, the distribution of the initialization, use, reuse, and destroyed.


In the initialization phase, nginx, ngx_event_process_init function in init process phase will assign one connection structure (ngx_connection_t) for each listening socket, and set the event handler function of the read event member (read) of the connection structure to ngx_event_accept, and if no accept mutex is used, In this function, the read event will be mounted on the nginx event handling model (poll or epoll, etc.); otherwise, it will wait until the end of the init process phase. In the event handling loop of the worker process, a process grabs the accept lock to mount the read event.


static ngx_int_t 
ngx_event_process_init(ngx_cycle_t *cycle) 
{ 
  ... 
 
 
  /*  Initializes the red-black tree used to manage all timers  */ 
  if (ngx_event_timer_init(cycle->log) == NGX_ERROR) { 
    return NGX_ERROR; 
  } 
  /*  Initialize the event model  */ 
  for (m = 0; ngx_modules[m]; m++) { 
    if (ngx_modules[m]->type != NGX_EVENT_MODULE) { 
      continue; 
    } 
 
 
    if (ngx_modules[m]->ctx_index != ecf->use) { 
      continue; 
    } 
 
 
    module = ngx_modules[m]->ctx; 
 
 
    if (module->actions.init(cycle, ngx_timer_resolution) != NGX_OK) { 
      /* fatal */ 
      exit(2); 
    } 
 
 
    break; 
  } 
 
 
  ... 
 
 
  /* for each listening socket */ 
  /*  Assign for each listening socket 1 Individual connection structure  */ 
  ls = cycle->listening.elts; 
  for (i = 0; i < cycle->listening.nelts; i++) { 
 
 
    c = ngx_get_connection(ls[i].fd, cycle->log); 
 
 
    if (c == NULL) { 
      return NGX_ERROR; 
    } 
 
 
    c->log = &ls[i].log; 
 
 
    c->listening = &ls[i]; 
    ls[i].connection = c; 
 
 
    rev = c->read; 
 
 
    rev->log = c->log; 
    /*  Identifies this read event as a new request connection event  */ 
    rev->accept = 1; 
 
 
    ... 
 
 
#if (NGX_WIN32) 
 
 
    /* windows You don't do the analysis in the environment, but the principle is similar  */ 
 
 
#else 
    /*  Sets the handler of the read event structure to ngx_event_accept */ 
    rev->handler = ngx_event_accept; 
    /*  If you are using accept With a lock, you have to grab the lock behind you to mount the listener handle on the event handling model  */ 
    if (ngx_use_accept_mutex) { 
      continue; 
    } 
    /*  Otherwise, mount the listener handle directly onto the event handling model  */ 
    if (ngx_event_flags & NGX_USE_RTSIG_EVENT) { 
      if (ngx_add_conn(c) == NGX_ERROR) { 
        return NGX_ERROR; 
      } 
 
 
    } else { 
      if (ngx_add_event(rev, NGX_READ_EVENT, 0) == NGX_ERROR) { 
        return NGX_ERROR; 
      } 
    } 
 
 
#endif 
 
 
  } 
 
 
  return NGX_OK; 
}

When a worker process mounts the listener event onto the event handling model at some point, nginx is ready to formally receive and process requests from the client. At this point, if a user enters a domain name in the address bar of the browser, and the DNS server resolves the domain name to a server monitored by nginx, the nginx event handling model will receive the read event and send it to the registered event handling function ngx_event_accept for processing.


In the ngx_event_accept function, nginx calls the accept function, gets a connection and the corresponding socket from the connected queue, then allocates a connection structure (ngx_connection_t), and saves the newly acquired socket in the connection structure. Some basic connection initialization work is also done here:
First, allocate a memory pool to the connection. The initial size is 256 bytes by default, which can be set by the connection_pool_size instruction.
Allocate the log structure and save it for use by subsequent logging systems;
Initializes the connection between the corresponding io transceiver function, the specific io transceiver function and the used event model and operating system;
Assign a set of interface address (sockaddr), and copy the corresponding address obtained by accept into it, and save it in the sockaddr field;
To save the local socket address in local_sockaddr fields, because this value is available, the monitored structure ngx_listening_t and monitored structure holds just listening address configuration file Settings, but the listening address configuration may be a wildcard *, namely listening in on all the address, so the connection in the value of the final may also change, will be determined to be the real receiving address;
Set the write event of the connection to ready, that is, set ready to 1, nginx default connection to writable the first time;
If the listening socket is set with the TCP_DEFER_ACCEPT property, it means that the connection has packets coming, so set the read event to ready.
Format the address saved by the sockaddr field into a readable string and save it in the addr_text field.
Finally, the ngx_http_init_connection function is called to initialize the rest of the connection structure.


The most important job of the ngx_http_init_connection function is to initialize the handler function for read and write events: Connect the structure of the event handler is set to ngx_http_empty_handler, the event handler will not do any operation, actually the default connection nginx 1 time to write, not mount events, if there is data needs to be sent, nginx wrote this link directly, only in the event of one write not over, will be mounted on the event to the event model, and set up the real write event handler, here at the back of the chapter will do in detail; The read event handler is set to ngx_http_init_request. If the connection already has data (deferred accept is set), the ngx_http_init_request function is called to handle the request. Otherwise, a timer is set and a read event is mounted on the event processing model, waiting for the data to arrive or time out. Of course, whether you already have data coming in, or you need to wait for data to come in, or you wait for a timeout, you're going to end up in the handler of the read event -- ngx_http_init_request.
 

The main job of ngx_http_request function is to initialize the request. Since it is an event handler function, it only has 11 parameters of ngx_event_t * type. ngx_event_t structure represents an event in nginx. In nginx, 1 will normally save the connection structure reference in the data field of the event structure, and the request structure reference in the data field of the connection structure, so that the corresponding connection structure and request structure can be easily obtained in the event handler function. Enter the function to see 1, first determine whether the event is a timeout event, if so, directly close the connection and return; The ngx_http_init_request function first allocates an ngx_http_request_t structure in the memory pool of the connection. This structure will be used to store all the information of the request. Once allocated, a reference to this structure is wrapped in the request field of the hc member that has the connection, so that the request structure can be reused in a long connection or an pipelined request. In this function, nginx according to the request of the receiving port and address to find a default virtual server configuration (default_server listen command is used to identify a default virtual server, or to monitor multiple virtual servers in the same port and address, in which the first definition is the default), because in nginx configuration file can set up multiple surveillance in different port and the address of the virtual server (server piece each corresponding to a virtual server). In addition, virtual servers listening on the same port and address can be distinguished according to the domain name (server_name directive can configure the corresponding domain name of this virtual server). Each virtual server can have different configuration contents, and these configuration contents determine how nginx handles a request after receiving it. Once found, the corresponding configuration is saved in the ngx_http_request_t structure corresponding to the request. Note that the default configuration found by port and address is only for temporary use 1. Eventually, nginx will find the real virtual server configuration by domain name. Subsequent initialization will include:

Set connection read event handlers to ngx_http_process_request_line function, this function is used to parse the request, the request of read_event_handler set to ngx_http_block_reading function, this function is actually doing nothing (of course the event model is set to the level of fires, only 1 do is to delete the event from the event model monitor list, to prevent the event 1 straight trigger). We'll talk about why we set read_event_handler to this function;
For this request allocate a buffer to hold its head, the address is saved in the header_in fields, the default size of 1024 bytes, you can use client_header_buffer_size instructions to modify, to note here, 1 nginx used to store the request header memory buffer is at the request of connection pool allocation, and 1 will address stored in the connection of buffer field, the aim is to give the connection under one request to reuse the buffer, And if the client send me the request of the head is greater than 1024 bytes, nginx will redistribute more buffer zone, the default buffer for big request head maximum 8 K, up to 4, these two values can be set with large_client_header_buffers command, and later also to request line and 1 head cannot exceed a maximum size of the buffer;
The same nginx will allocate a memory pool for this request, which will be used by all subsequent memory allocation 1 related to this request. The default size is 4096 bytes, which can be modified using the request_pool_size instruction.
Assign a linked list of response headers to this request with an initial size of 20;
Create all the module context ctx pointer array, variable data;
Set the main field of the request to itself, which means that it is a primary request. In nginx, there is also the concept of subrequest, which will be introduced in detail in the following chapters.
Set the count field of the request to 1, and the count field represents the reference count of the request.
Keep the current time in start_sec and start_msec fields, this time is the starting moment of the request, will be used to calculate the processing time of a request (request time), nginx using the starting point and apache slightly difference, nginx request is received, the starting point of the client in the first packet, and apache is to receive the client's whole request line after begins to calculate case.
Initialize other fields of the request, such as setting uri_changes to 11, which means that the uri of the request can be overwritten up to 10 times. subrequests is set to 201, which means that a request can initiate up to 200 sub-requests.
After all this initialization, the ngx_http_init_request function calls the read event handler to actually parse the data sent by the client, which is then processed into the ngx_http_process_request_line function.


The main function of ngx_http_process_line function is to parse the request line. Also, because it involves network IO operation, even a short request line of 1 line may not be read once. Therefore, in the previous ngx_http_init_request function, ngx_http_process_request_line function is set as the handler of the read event. It also has only one unique ngx_event_t * parameter, and at the beginning of the function, it also needs to determine whether it is a timeout event. If so, it closes the request and connection. Otherwise start the normal parsing process. The ngx_http_read_request_header function is called to read the data.


Since it may enter ngx_http_process_request_line function for many times, ngx_http_read_request_header function first checks whether there is any data in the buffer pointed to by the requested header_in function. Otherwise, the data will be read from the connection and saved in the cache area pointed to by the requested header_in, and as long as there is space in the buffer, it will read as much data as possible once, and return as much data as it reads. If the client does not send any data and returns NGX_AGAIN, it will do two things before returning: 1. Set a timer and the default duration is 60s, which can be set by client_header_timeout. If there is no readable event before the timing event arrives, nginx will close the request; 2. 2. Call the ngx_handle_read_event function to handle the read event under 1 -- mount the read event if the connection has not already mounted it on the event processing model; If the client closes the connection in advance or other errors occur in reading the data, 1 400 errors will be returned to the client (of course, there is no guarantee that the client will receive the response data, because the client may have closed the connection), and finally the function returns NGX_ERROR;


If the ngx_http_read_header function reads the data normally, ngx_http_request_line function will call ngx_http_process_request_header function to parse. This function implements a finite state machine according to the definition of the request line in http protocol specification. Through this state machine, nginx records the request method in the request line (Method), the initial location of the request uri and the http protocol version in the buffer, and some other useful information during the parsing process for later processing. This function returns NGX_OK if there are no problems parsing the request line. If the request line does not meet the protocol specification, the function will immediately terminate the parsing process and return the corresponding error number. If the buffer data is insufficient, the function returns NGX_AGAIN. Throughout the state machine that parses http requests, two important principles are followed: reduce memory copy and backtrace. Memory copying is a relatively expensive operation, and a large number of memory copies results in lower runtime efficiency. nginx where you need to do to memory copy copy only as far as possible the beginning and end of the memory address rather than the memory itself, do so only need two assignment operation, greatly reduces the cost, of course, this is the impact of the subsequent operations cannot modify the memory itself, if change, will affect all references to the interval of the memory, so it must be very careful management, need to copy one if needed. Here have to mention the nginx can reflect the data structure of 1 thought, most ngx_buf_t, it used to represent the nginx cache, in many cases, the only need to 1 piece of memory starting address and ending address stored in its pos and last members respectively, then its memory mark 1, can be said 1 piece cannot modify memory interval, in the other need 1 piece of cache can modify case, you must allocate 1 block size of memory required and maintain its starting address, Set the temprary flag of ngx_bug_t to 1 to indicate that this is an area of memory that can be modified.


Return to the ngx_http_process_request_line function. If the ngx_http_parse_request_line function returns an error, return 400 error to the client.
If NGX_AGAIN is returned, you need to determine whether 1 is due to insufficient buffer space or insufficient read data. If the buffer size is insufficient, nginx will call ngx_http_alloc_large_header_buffer function to allocate another large buffer. If the large buffer is not enough to hold the entire request line, nginx will return 414 error to the client. Otherwise, after allocating a larger buffer and copying the previous data, Continue to call the ngx_http_read_request_header function to read data to enter the request line automata processing, until the end of the request line parsing;
If returned to the NGX_OK, said the request line is correctly parse out, then record first requests the start address and length, and will request uri path and parameter part kept in structure of uri fields, request method starting position and length in method_name fields, http version starting position and the length of the record in http_protocol fields. The parameter and the requested resource extension are also parsed from uri and stored in the args and exten fields, respectively.

Discard request body

1 module wants to take the initiative to discard the client sent a request body, you can call nginx core provides ngx_http_discard_request_body () interface, active discarded reason may have a lot of kinds, such as the business logic of the modules do not need to request body, the client sends the excessive request body, in addition to compatible http1. 1 agreement pipeline request, modules have a duty to take the initiative to throw away unwanted request body. In summary, in order to maintain good client compatibility, nginx must actively discard unwanted request bodies. Let's start analyzing the ngx_http_discard_request_body() function:


ngx_int_t 
ngx_http_discard_request_body(ngx_http_request_t *r) 
{ 
  ssize_t    size; 
  ngx_event_t *rev; 
 
  if (r != r->main || r->discard_body) { 
    return NGX_OK; 
  } 
 
  if (ngx_http_test_expect(r) != NGX_OK) { 
    return NGX_HTTP_INTERNAL_SERVER_ERROR; 
  } 
 
  rev = r->connection->read; 
 
  ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0, "http set discard body"); 
 
  if (rev->timer_set) { 
    ngx_del_timer(rev); 
  } 
 
  if (r->headers_in.content_length_n <= 0 || r->request_body) { 
    return NGX_OK; 
  } 
 
  size = r->header_in->last - r->header_in->pos; 
 
  if (size) { 
    if (r->headers_in.content_length_n > size) { 
      r->header_in->pos += size; 
      r->headers_in.content_length_n -= size; 
 
    } else { 
      r->header_in->pos += (size_t) r->headers_in.content_length_n; 
      r->headers_in.content_length_n = 0; 
      return NGX_OK; 
    } 
  } 
 
  r->read_event_handler = ngx_http_discarded_request_body_handler; 
 
  if (ngx_handle_read_event(rev, 0) != NGX_OK) { 
    return NGX_HTTP_INTERNAL_SERVER_ERROR; 
  } 
 
  if (ngx_http_read_discarded_request_body(r) == NGX_OK) { 
    r->lingering_close = 0; 
 
  } else { 
    r->count++; 
    r->discard_body = 1; 
  } 
 
  return NGX_OK; 
} 

Since the function is not long, it is fully listed here, and the beginning of the function also determines the case that it does not need to be processed again: the subrequest does not need to be processed, and those who have already called this function do not need to be processed. Then call ngx_http_test_expect() to handle the case of http1.1 expect. According to http1.1 expect mechanism, if the client sends expect header and the server does not want to receive the request body, it must return 417(Expectation Failed) error. nginx does not do this, it simply asks the client to send in the request body and discard it. Next, read the event on the timer function is deleted, because at this moment do not need to request body itself, so it doesn't matter the client sends the fast or slow, of course, will also back to when nginx has finished processing the request but the client haven't send out useless request body, nginx will hang on reading event timer.
The function also checks the content-length header in the request header. If the client wants to send the body of the request, it must send the content-length header, and it also checks to see if the body of the request has been read elsewhere. If the request body is indeed to be processed, the function then checks the preread data in the request header buffer. The preread data will be thrown away, and of course, if the request body has been preread in its entirety, the function will return directly.

Next, if there are any remaining request bodies to be processed, the function calls ngx_handle_read_event() to mount the read event in the event handling mechanism and sets the read event handler to ngx_http_discarded_request_body_handler. With this in place, the function finally calls the ngx_http_read_discarded_request_body() interface to read the body of the request from the client and discard it. If the client does not send the request body once, the function returns, and the remaining data is passed to ngx_http_discarded_request_body_handler() for processing when the next read event arrives. In this case, the requested discard_body is set to 1 to identify this situation. In addition, the number of references requested (count) is also added by 1. The purpose of this is that the client may not fully send the body of the request to be sent after the request is processed by nginx. The reference is added to prevent the core of nginx from directly releasing the resources related to the request after processing the request.

The ngx_http_read_request_body () function is very simple. It reads the data from the link in a loop and throws it away until it has read all the data in the receive buffer. If the request body has been read, the function sets the read event handler to ngx_http_block_reading.
Let's look at the handler function ngx_http_discarded_request_body_handler. This function will be called every time it reads an event. Let's look at the source code of ngx_http_request_body_handler.


void 
ngx_http_discarded_request_body_handler(ngx_http_request_t *r) 
{ 
  ... 
 
  c = r->connection; 
  rev = c->read; 
 
  if (rev->timedout) { 
    c->timedout = 1; 
    c->error = 1; 
    ngx_http_finalize_request(r, NGX_ERROR); 
    return; 
  } 
 
  if (r->lingering_time) { 
    timer = (ngx_msec_t) (r->lingering_time - ngx_time()); 
 
    if (timer <= 0) { 
      r->discard_body = 0; 
      r->lingering_close = 0; 
      ngx_http_finalize_request(r, NGX_ERROR); 
      return; 
    } 
 
  } else { 
    timer = 0; 
  } 
 
  rc = ngx_http_read_discarded_request_body(r); 
 
  if (rc == NGX_OK) { 
    r->discard_body = 0; 
    r->lingering_close = 0; 
    ngx_http_finalize_request(r, NGX_DONE); 
    return; 
  } 
 
  /* rc == NGX_AGAIN */ 
 
  if (ngx_handle_read_event(rev, 0) != NGX_OK) { 
    c->error = 1; 
    ngx_http_finalize_request(r, NGX_ERROR); 
    return; 
  } 
 
  if (timer) { 
 
    clcf = ngx_http_get_module_loc_conf(r, ngx_http_core_module); 
 
    timer *= 1000; 
 
    if (timer > clcf->lingering_timeout) { 
      timer = clcf->lingering_timeout; 
    } 
 
    ngx_add_timer(rev, timer); 
  } 
} 

Function 1 handles the read event timeout at the beginning. As mentioned earlier, the read event timer has been removed from the ngx_http_discard_request_body() function. When will the timer be set? The answer is in nginx has processed the request, but not completely to the request of the request body discarded when (the client may not send), in ngx_http_finalize_connection () function, if there is not check to discard the request body, nginx will add a read event timer, its length is lingering_timeout instructions specified, the default is 5 seconds, but this time only two read event between the timeout time, The total wait time for the request body is specified by the lingering_time directive, which defaults to 30 seconds. In this case, the function simply returns and disconnects if it detects a timeout event. Also, you need to control that the entire discard request body does not take longer than the lingering_time setting, and if it does, it will simply return and disconnect.
If the read event occurs before the request is processed, instead of handling the timeout event or setting a timer, the function simply calls ngx_http_read_discarded_request_body() to read and discard the data.


Related articles: