In depth understanding of how nginx achieves high performance and scalability

  • 2020-05-15 03:44:29
  • OfStack

The overall architecture of NGINX is characterized by 1 set of processes working together:

Master process: responsible for performing privileged operations such as reading configuration files, binding sockets, creating/notifying coordination (Signalling) child processes.

Worker process: responsible for receiving and processing connection requests, reading and writing to disk, and communicating with upstream servers. When NGINX is active, only the work process is busy.

Cache loader process: responsible for loading the disk cache into memory. The process runs at startup and then exits.

Cache manager process: responsible for defragmenting the disk cache data so that it does not cross the line. The process runs intermittently.

The key to NGINX's ability to achieve high performance and scalability depends on two basic design choices:

Limit the number of worker processes as much as possible to reduce the overhead of context switching. The default and recommended configuration is to have one worker process per CPU kernel to make efficient use of hardware resources.

The worker process is single-threaded and handles multiple concurrent connections in a non-blocking manner.

Each worker in NGINX processes multiple connection requests through a state machine, which is implemented as a non-blocking way of working:

Each worker process needs to process several sockets, including listening on a socket or connecting to a socket.

When the listening socket receives a new request, a new connection socket is opened to handle the communication with the client.

When an event arrives at the connection socket, the worker process quickly completes the response and moves on to handle any other socket's newly received events.

Garrett says that NGINX's choice of such a design makes it fundamentally different from other Web servers. The usual Web server chooses to assign each connection to a separate thread, which makes it easy to process multiple connections, since each connection can be thought of as a linear sequence of multiple steps, but this incurs context-switching overhead. In fact, the worker thread spends most of its time in a blocked state, waiting for a client or other upstream server. The cost of context switching becomes apparent when the number of concurrent connections/threads trying to perform an operation such as I/O exceeds a certain threshold or when memory is exhausted.

On the other hand, NGINX is designed to prevent worker processes from blocking network traffic unless there is no work to be done. In addition, each new connection consumes very few resources, including only one file descriptor and a small amount of working process memory.

In general, this mode of operation of NGINX is capable of handling hundreds or even thousands of concurrent HTTP connections per worker process after system tuning.

In-depth NGINX: how do we design it for performance and scalability

The NGINX is so superior in performance because of the design behind it. While many web servers and application servers use simple threaded (threaded) or process-based (process-based) architectures, NGINX stands out with a complex event-driven (event-driven) architecture that can support thousands of concurrent connections on modern hardware.

Inside NGINX infographic covers everything from high-level process architecture mining to diagrams of NGINX's single process handling multiple connections. This article explains these details.

Set the scene -- NGINX process model

Setting the Scene ? the NGINX Process Model

To better understand the design, you need to understand how the NGINX works. NGINX has one main process (master process)(performing privileged operations, such as read configuration, bind port) and a series of 1 worker processes (worker process) and helper processes (helper process).

Within this 4-core server, the NGINX primary process creates four worker processes and two cache secondary processes (cache helper processes) to manage the disk content cache (on-disk content cache).

Why architecture is important

Why Is Architecture Important?

The fundamental foundation of any Unix application is a thread or process. (from the point of view of the Linux operating system, threads and processes are essentially the same, with the main difference being the degree to which they share memory.) A process or thread is a set of independent instructions that can be scheduled by the operating system and run on the CPU kernel. Most complex applications run multiple threads or processes in parallel for two reasons:

● you can use more computer cores at the same time.

Threads and processes make parallel operations easy to implement (for example, handling multiple connections at the same time).

Both processes and threads consume resources. They all use memory and other OS resources, causing the kernel to switch frequently (an operation known as context switching (context switch)). Most modern servers can handle hundreds of small, active (active) threads or processes at the same time, but server performance degrades significantly when 1 dener of memory is exhausted, or when high I/O loads result in a large number of context switches.

For network applications, it is common to assign one thread or process per connection (connection). This architecture is easy to implement, but its scalability becomes problematic when applications need to handle thousands of concurrent connections.

How NGINX works.

How Does NGINX Work?

NGINX USES a predictable (predictable) process model to schedule available hardware resources:

1. The main process performs privileged operations, such as reading the configuration and binding the port, and is also responsible for creating child processes (of the following three types).

2. The cache loading process (cache loader process) runs at startup, loads the disk-based cache (disk-based cache) into memory, and exits. It is carefully scheduled, so its resource requirements are low.

3. The cache management process (cache manager process) runs periodically and cuts the disk cache (prunes entries from the disk caches) to keep it within the configuration range.

4. The worker process (worker processes) is the process that performs all the actual tasks: handling network connections, reading and writing to disk, communicating with upstream servers, etc.

In most cases, NGINX recommends that one worker process be run per CPU core for the most efficient use of hardware resources. You can set the following commands in the configuration:

worker_processes auto

When the NGINX server is running, only the worker processes are busy. Each worker process processes multiple connections in a non-blocking manner to reduce the overhead of context switching.

Each worker process is single-threaded and runs independently, grabbing and processing new connections. Processes share cached data, session persistent data (session persistence data), and other Shared resources by sharing memory.

The internal work process of NGINX

Inside the NGINX Worker Process

Each NGINX worker process is initialized by the NGINX configuration (NGINX configuration) and has a set of listening sockets set by the main process (listen sockets).

The NGINX worker process listens for events on the socket (accept_mutex and kernel socket sharding) to determine when to start work. The event is initialized by the new connection. These connections are assigned to the state machine (state machine) -- the HTTP state machine is most commonly used, but NGINX also implements the state machine for streaming (native TCP) and a number of mail protocols (SMTP, IMAP and POP3).

A state machine is essentially a set of instructions telling NGINX how to process a request. Most web servers with the same functionality as NGINX use a similar state machine -- but with a different implementation.

Scheduling state machine

Scheduling the State Machine

Think of state machines as the rules of chess. Each HTTP transaction (HTTP transaction) is a chess match. On one side of the board is the web server -- a master chess player who can make a quick decision. On the other side is the remote client - on a relatively slow network, accessing the site or application's web browser.

However, the rules of the game can be complicated. For example, the web server may need to communicate with the parties (proxy 1 upstream application) or with the authentication server. The third party module of the web server also extends the rules of the game.

Blocking state machine

A Blocking State Machine

Recall our previous description of processes and threads: 1 set of independent instructions that can be scheduled by the operating system and run on the CPU kernel. Most web servers and web applications use a 1 connection /1 process or 1 connection /1 thread model for this chess game. Each process or thread contains one instruction to play the game to the end. In this process, the process is run by the server, which spends most of its time "blocking (blocked)", waiting for the client to complete its next action.

1. The web server process (web server process) listens for new connections (new contests initiated by the client) on the listening socket.

2.1 after a new game is launched, the process starts to work. After each move, it enters the blocking state and waits for the client to make a move.

3.1 once the game is over, the web server process will see if the client wants to start a new game (this is equivalent to one live connection). If the connection is closed (the client leaves or times out), the web server process goes back to listening and waits for a brand new race.

Remember one important point: every active HTTP connection (every chess match) requires a dedicated process or thread (a master chess player). This architecture makes it very easy to extend third party modules (" new rules "). However, there is a huge imbalance: a lightweight HTTP connection, represented by a file descriptor (file descriptor) and a small amount of memory, maps to a single process or thread -- very heavyweight operating system objects. This is programmatically convenient, but it is hugely wasteful.

NGINX is a true master

NGINX is a True Grandmaster

Perhaps you have heard of the exhibition wheel, in which a grandmaster plays against ten opponents at the same time.

Kiril Georgiev played against 360 players at the same time in Sofia, Bulgaria, winning 284 games, drawing 70, and losing 6.

This is how the NGINX working process plays chess. Each progress is a master (remember: typically, each progress takes up one CPU kernel) and can play against hundreds (actually thousands) of players at the same time.

1. The worker process waits for events on listening sockets and connection sockets.

2. Events occur on sockets and are handled by the worker process.

● listening for events on the socket means: the client starts a new game. The worker process created a new connection socket.

An event on a connection socket means that the client has moved a piece. The work process responds quickly.

The work process never stops on the network, it is always waiting for its "opponent" (the client) to respond. When it has moved a piece in the game, it will immediately move on to the next game or meet a new opponent.

Why is it faster than a blocking multi-process architecture?

Why Is This Faster than a Blocking, Multi-Process Architecture?

The size of NGINX is good for supporting tens of thousands of connections per worker process. Each new connection creates another file descriptor and consumes a small amount of additional memory in the worker process. The additional consumption per connection is minimal. The NGINX process can maintain a fixed CPU occupancy rate. There are also fewer context switches when not working.

In the blocking, 1 connection /1 process mode, each connection requires significant additional resources and overhead, and context switching (from one process to another) is very frequent.

To learn more, check out the article on NGINX architecture by Andrew Alexeev, vice President of development and co-founder, NGINX.

With proper system tuning, NGINX can handle 100,000 concurrent HTTP connections per worker process on a large scale, without losing any information during peak traffic (new RACES start).

Configure updates and NGINX upgrades

Updating Configuration and Upgrading NGINX

The NGINX process architecture, which contains only a few working processes, makes configuration and even updates to the base 2 files themselves very efficient.

Updating the NGINX configuration is a very simple, lightweight, and reliable operation. Run nginx & # 63; The s reload command checks the configuration on disk and sends an SIGHUP signal to the main process.

When the main process receives the SIGHUP signal, it does two things:

1. Reload the configuration, fork1 set of new worker processes. These new worker processes immediately start accepting connections and handling traffic (traffic)(using the new configuration).

2. Signal the quiet exit of the old work process. These old processes will not accept new connections. As soon as the HTTP request they processed ends, they cleanly close the connection. Once all connections are closed, the worker process exits.

This process results in a small peak in CPU footprint and memory usage, but this small peak is negligible compared to loading resources from an active connection. You can reload the configuration multiple times in 1 second. Very rarely, generation after generation of worker processes will have problems waiting for the connection to close, but even if they do, they will be fixed immediately.

NGINX's base 2 upgrade process is even more magical - you can quickly upgrade NGINX itself without any loss of connection, downtime, or service.

The base 2 upgrade process is similar to configuration updates. The new NGINX master process is parallel to the original master process, and they share listening sockets. Both processes are active (active), and their respective worker processes handle their own traffic (traffic). You can then notify the old main process and its worker process to exit perfectly.

In Controlling NGINX, the whole process is described in more detail.

conclusion

Conclusion

The internal diagram of NGINX gives a high-level overview of how NGINX works, but behind this simple explanation is more than 10 years of innovation and optimization. These innovations and optimizations enable NGINX to perform well on a wide variety of hardware, while still providing the security and reliability required for modern web applications.


Related articles: