The Linux vmstat command does actual parsing

2020-04-02 01:32:56
OfStack

The vmstat command is the most common Linux/Unix monitoring tool that displays the status of the server at a given interval, including the server's CPU usage, memory usage, virtual memory swapping, and IO reads and writes. This command is my favorite command to check Linux/Unix. One is that Linux/Unix is supported. The other is that compared with top,I can see the CPU, memory and IO usage of the whole machine, instead of just seeing the CPU utilization and memory utilization of each process (different usage scenarios).

The general use of vmstat tool is accomplished by two numerical parameters, the first parameter is the number of sampling intervals in seconds, and the second parameter is the number of sampling times, such as:


root@ubuntu:~# vmstat 2 1procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 1  0      0 3498472 315836 3819540    0    0     0     1    2    0  0  0 100  0

2 means the server state is collected every two seconds, and 1 means the server state is collected only once.

In fact, during the application, we will monitor for a period of time. We don't want to monitor vmstat directly, for example:


root@ubuntu:~# vmstat 2  
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0      0 3499840 315836 3819660    0    0     0     1    2    0  0  0 100  0
 0      0 3499584 315836 3819660    0    0     0     0   88  158  0  0 100  0
 0      0 3499708 315836 3819660    0    0     0     2   86  162  0  0 100  0
 0      0 3499708 315836 3819660    0    0     0    10   81  151  0  0 100  0
 0      0 3499732 315836 3819660    0    0     0     2   83  154  0  0 100  0

This means that vmstat collects data every two seconds, and it keeps collecting data until I finish the program, and I've done that five times.

All right, that's the end of the command. Now let's go over the meaning of each parameter.

r Represents the run queue (that is, how many processes are actually allocated to the CPU). The server I tested has a relatively idle CPU and few programs are running. When this value exceeds the number of cpus, there will be a CPU bottleneck. This is also related to the load of top. Generally, the load is higher than 3, higher than 5, and abnormal when it exceeds 10. The state of the server is very dangerous. The load on top is similar to the run queue per second. If the run queue is too large, your CPU is busy, which generally results in high CPU usage.

b It's a blocked process, and I won't go into that, it's blocked, you get the idea.

SWPD The size of virtual memory used, if greater than 0, indicates that your machine is running out of physical memory, and if it is not the cause of a program memory leak, it is time to upgrade memory or move memory consuming tasks to another machine.

free The size of the free physical memory, my machine memory total 8 gigabytes, the remaining 3415M.

buff Linux/Unix system is used to store, what contents are in the directory, permissions, etc. Cache, my machine is about 300 M

The cache The cache is used directly to remember the files we open and to cache them. My machine takes up about 300 M. (here is the clever part of Linux/Unix.

si The amount of virtual memory that is read from disk every second. If this value is greater than 0, it means that the physical memory is insufficient or memory leaks. My machine has plenty of memory and everything is fine.

so The size of the virtual memory write disk per second, if this value is greater than 0, ibid.

bi The number of blocks per second received by the block device, which in this case refers to all the disks and other block devices on the system. The default block size is 1024 bytes. I don't have much IO on my machine, so it's always zero, but I've seen on a machine that handles copying a lot of data (2-3t) that it can go up to 140000/s, and the disk write speed is about 140M /s, right

bo The number of blocks per second sent by the block device, for example, if we read a file, bo has to be greater than 0. Bi and bo should be close to 0 in general, or IO is too frequent and needs to be adjusted.

The in Number of CPU interrupts per second, including time interrupts

cs Context switching times per second, we call the system function, for example, be a context switch, switch threads, also want to process a context switch, this value should be as small as possible, it's too big, want to consider lowering the number of threads or processes, such as the apache and nginx this web server, we usually do the performance test will be thousands of concurrent even tens of thousands of concurrent test, select the web server process may have been cut by the peak value of the process or thread, pressure measurement, until the cs to a smaller value, the process and thread count is to compare the value of the right. System call is also, every time we call the system function, our code will enter the kernel space, causing a context switch, which is very expensive, but also try to avoid calling the system function frequently. Too many context switches means that most of your CPU is wasting time on context switches, resulting in less time for the CPU to do serious work.

us User CPU time. I once saw us approaching 100 and r running queue reaching 80 on a server that did encryption and decryption very frequently.

sy System CPU time, if too high, indicates a long system call time, such as frequent IO operations.

id Idle CPU time, generally speaking, id + us + sy = 100, generally I think id is idle CPU utilization, us is user CPU utilization, sy is system CPU utilization.

wt Wait IO CPU time.