Record a troubleshooting record of tomcat process cpu occupation

  • 2021-07-26 09:04:39
  • OfStack

This article mainly records a troubleshooting record of tomcat process and CPU occupation due to too many TCP connections.

Problem description

Under linux system, the occupancy rate of cpu of one tomcat web service is very high, and the display result of top exceeds 200%. The request could not be answered. Repeated restart is still the same phenomenon.

Problem troubleshooting

1. Get process information

The jps command provided by jdk can quickly find out the jvm process,


jps pid

2. View jstack information


jstack pid

A large number of log4j threads block were found to be in waiting lock state

org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=201 (Compiled frame)

Searching for information reveals deadlock issues in log4j 1. x.

When the problem is found, adjust the log4j configuration, only open the error level log, and restart tomcat. At this point, the block thread in stack disappears, but the cpu occupancy rate of the process is still high.

3. Take one step to check

To analyze the cpu usage of each thread, it is necessary to introduce a script contributed by Great God to calculate the cpu usage of each thread in the java process.


#!/bin/bash

typeset top=${1:-10}
typeset pid=${2:-$(pgrep -u $USER java)}
typeset tmp_file=/tmp/java_${pid}_$$.trace

$JAVA_HOME/bin/jstack $pid > $tmp_file
ps H -eo user,pid,ppid,tid,time,%cpu --sort=%cpu --no-headers\
    | tail -$top\
    | awk -v "pid=$pid" '$2==pid{print $4"\t"$6}'\
    | while read line;
do
    typeset nid=$(echo "$line"|awk '{printf("0x%x",$1)}')
    typeset cpu=$(echo "$line"|awk '{print $2}')
    awk -v "cpu=$cpu" '/nid='"$nid"'/,/^$/{print $0"\t"(isF++?"":"cpu="cpu"%");}' $tmp_file
done

rm -f $tmp_file

Scope of application of script

Because the% CPU data in ps comes from/proc/stat, this data is not real-time, but depends on the frequency of updating it by OS, which is generally 1S. Therefore, the statistics you see will be different from the information from jstack, which is the reason ~ but this information is very powerful for troubleshooting the problems caused by a few threads in LOAD, because these fixed few threads will continuously consume the resources of CPU, and even if there is a time difference, it is all caused by these threads anyway.

In addition to this script, the simpler method is to find out the process id and check the resource usage of each thread in the process with the following command


top -H -p pid

Get pid (thread id) from here, convert it to hexadecimal, and then look up the thread information of the object in stack information.

Through the above method, it is found that the cumulative sum of cpu occupancy rate of threads corresponding to tomcat process is about 80%, which is far less than 200% + given by top

It shows that there is no thread that occupies cpu for a long time, and it should belong to cpu intensive computing with many transiencies. Furthermore, it is doubtful whether jvm has insufficient memory and frequent gc.


jstat -gc pid

It is found that there is no abnormality in the memory usage of jvm, and the number of gc has obviously skyrocketed

After checking the memory, because it is a network program, check the network connection in one step.

4. Problem positioning

Query the tcp links on the corresponding port of tomcat, and find that there are a large number of EASTABLISH links, as well as some connections in other states, totaling 400 +.


netstat -anp | grep port

Looking at the sources of these connections in step 1, it is found that it is the application side of the tomcat service, and there are a large number of background threads, which frequently poll the service, resulting in the number of tomcat connections of the service being full and unable to continue receiving requests.

netstat status description:

LISTEN: Listens for connection requests from remote TCP ports SYN-SENT: Resend the connection request and wait for the matching connection request (if there are a large number of such status packets, check whether it is recruited) SYN-RECEIVED: After receiving and sending another connection request, wait for the other party to confirm the connection request (if there is a large number of this status, it is estimated to be flood***) ESTABLISHED: Represents 1 open connection FIN-WAIT-1: Waiting for remote TCP connection interruption request, or acknowledgement of previous connection interruption request FIN-WAIT-2: Waiting for a connection break request from a remote TCP CLOSE-WAIT: Waiting for a disconnection request from a local user CLOSING: Waiting for remote TCP's acknowledgement of disconnection LAST-ACK: Wait for confirmation of the original connection interruption request to the remote TCP (not a good thing, this item appears, check whether it has been ***) TIME-WAIT: Wait enough time to ensure that the remote TCP receives an acknowledgement of the connection interruption request CLOSED: No connection status

5. Root cause analysis

The direct triggering reasons are client polling, abnormal request and continuing rotation; There are new background threads joining the polling team on the client side, which eventually leads to the tomcat connection on the server side being filled up.


Related articles: