Some Suggestions for Linux System Optimization (Kernel Optimization)

  • 2021-08-28 21:46:17
  • OfStack

Turn off swap

If there is a database service or message middleware service running on the server, close the swap partition


echo "vm.swappiness = 0" >> /etc/sysctl.conf
sysctl -p

OOM Killer

1. Our linux service is mixed service, and the physical memory applied by each program is shared; For example, the physical memory is only 1g, so it is possible to start two programs and apply for 1g. linux makes full use of the memory through this over-allocation. When the actual memory used by the program exceeds the physical memory, the system will kill one part of the program according to the priority to ensure the normal operation of other programs; To prevent the core service from being killed, you can set the process file to the highest priority.


#  The smaller the value, the less likely it is to be killed 
echo -17 > /proc/$pid/oom_score_adj

TCP

Because the database and some messaging middleware services we provide work on the intranet, we can optimize the TCP parameters for the intranet.

net.ipv4.tcp_syn_retries

The default value is 6 and the reference value is 2. When the host, as the client, initiates the TCP connection, that is, the first step of the 3-way handshake, the kernel sends the retry number of SYN messages, and abandons the connection after exceeding this number. The intranet environment communicates well, so this value can be reduced moderately

net.ipv4.tcp_synack_retries

The default value is 5 and the reference value is 2. As the server, when the host accepts TCP connection, it sends SYN+ACK message retry times to the client in the second step of 3 handshakes, and abandons the connection after exceeding this number. This value can be moderately reduced in the intranet environment

net.ipv4.tcp_timestamps

Whether the timestamp is turned on or not, RTT can be calculated more accurately after it is turned on, and some other features also depend on the timestamp field.

net.ipv4.tcp_tw_reuse

The default value is 0 and the recommended value is 1. Whether to allow socket in TIME_WAIT state for new TCP connections. This is effective in reducing the number of TIME_WAIT. This parameter will only take effect if tcp_timestamps is turned on.

net.ipv4.tcp_tw_recycle

Whether to turn on the fast recycling of TIME_WAIT socket is a more radical way than tcp_tw_reuse, which also depends on tcp_timestamps option. It is strongly recommended not to turn on tcp_tw_recycle, There are two reasons, 1 is TIME_WAIT is a 10-point necessary state, Avoid data confusion between closed connections and new connections, 2 is tcp_tw_recycle option in NAT environment will cause 1 new connections to be rejected, because there is time difference between each host under NAT, which is reflected in the timestamp field in the socket. The server will find that the timestamp that should have been incremented on an IP is reduced, and the message with relatively reduced timestamp will be discarded

net.core.somaxconn

The default value is 128 and the reference value is 2048. Defines the maximum length of listening queue on each port in the system. When the server listens on a certain port, the operating system completes three handshakes for the client connection request. These established connections are stored in a queue, waiting to be taken away by accept call. This option defines the length of this queue. Increasing this value can reduce the number of reject on the server side in high concurrency scenarios.

net.ipv4.tcp_max_syn_backlog

Client requests are managed by two queues at the server. One is to wait for accept to be put into a queue after the connection is established with the client, and the length of this queue is controlled by somaxconn parameters; The other is that the connection being established but not completed stores a queue separately, and the length of this queue is controlled by tcp_max_syn_backlog; Default 128, tune to 8192.

net.ipv4.tcp_max_tw_buckets

The default value is 4096 and the reference value is 100,000. Defines the system to maintain the maximum number of TIME_WAIT sockets at the same time. If this number is exceeded, the TIME_WAIT sockets are immediately cleared and a warning message is printed. If the system is troubled by too many TIME_WAIT, three options can be adjusted to alleviate it: tcp_max_tw_buckets, tcp_tw_reuse and tcp_timestamps. The TIME_WAIT state is generated on the 1 side that actively closes when the TCP session closes. If you want to solve the problem fundamentally, let the client actively close the connection instead of the server.

page cache

page cache is the dirty page of the system, which is the io cache of the system. When the data is written to the disk, it will be written to page cache first, and then brushed into the disk asynchronously; Write caching can improve the access speed of IO, but it also increases the risk of data loss.

There are three opportunities to brush from page cache to disk:

When the available physical memory is lower than a specific threshold, in order to free up free memory for the system; When the residence time of dirty pages exceeds a specific threshold, in order to avoid dirty pages staying in memory indefinitely; Triggered by the user's sync () or fsync ().

There are two write strategies for the brush disk executed by the system:

Asynchronous execution of brush disk without blocking user I/O;; The swipe is executed synchronously, and the user I/O is blocked until the dirty page falls below a certain threshold.

In the first case, the system first executes the first strategy. When the amount of dirty page data is too large and asynchronous execution is too late to finish brushing, the system switches to synchronous mode.

We can adjust the brush threshold of dirty data through kernel parameters:

vm. dirty_background_ratio, defaults to 10. This parameter defines 1 percentage. When the dirty data in memory exceeds this percentage, the system brushes the disk asynchronously. vm. dirty_ratio, with a default of 30. A percentage is also defined. When the dirty data in memory exceeds this percentage, the system brushes the disk synchronously, and the write request is blocked until the dirty data is lower than dirty_ratio. If it is higher than dirty_background_ratio, switch to asynchronous mode. Therefore, dirty_ratio should be higher than dirty_background_ratio.

In addition to percentage control, you can specify an expiration time: vm.dirty_expire_centisecs, which defaults to 3000 (30 seconds) in hundredths of a second, after which dirty data is asynchronously brushed.

You can view the current number of dirty pages in the system with the following command:


cat /proc/vmstat |egrep "dirty|writeback"
nr_dirty 951
nr_writeback 0
nr_writeback_temp 0
# The output shows that there are 951 Dirty pages waiting to be written to disk. By default, the size of each page is 4KB . In addition, it can also be found in the /proc/meminfo See this information in the file. 

If the data security requirements are not so high, and you want more "cache" 1 data to make it easier for reading to hit cache, you can increase the proportion of dirty data and expiration time:


vm.dirty_background_ratio = 30
vm.dirty_ratio = 60
vm.dirty_expire_centisecs = 6000

Similarly, if you don't want io to be blocked due to the brush disk, you can appropriately reduce the value of asynchronous brush disk, which can make io smoother:


vm.dirty_background_ratio = 5
vm.dirty_ratio = 60

The above is the Linux system optimization of 1 recommendations (kernel optimization) details, more information about Linux system optimization please pay attention to other related articles on this site!


Related articles: