apache log file details and practical analysis commands

  • 2020-05-09 19:45:38
  • OfStack

1. Log analysis
If apache is installed with the default configuration, then two files are generated in the /logs directory, access_log and error_log
1). access_log
access_log is an access log that records all requests to the apache server. Its location and content are controlled by the CustomLog directive, which can be used to simplify the content and format of this log
For example, one of my servers is configured as follows:


CustomLog "| /usr/sbin/rotatelogs /var/log/apache2/%Y_%m_%d_other_vhosts_access.log 86400 480" vhost_combined 

-rw-r--r-- 1 root root 22310750 12-05 23:59 2010_12_05_other_vhosts_access.log 
-rw-r--r-- 1 root root 26873180 12-06 23:59 2010_12_06_other_vhosts_access.log 
-rw-r--r-- 1 root root 26810003 12-07 23:59 2010_12_07_other_vhosts_access.log 
-rw-r--r-- 1 root root 24530219 12-08 23:59 2010_12_08_other_vhosts_access.log 
-rw-r--r-- 1 root root 24536681 12-09 23:59 2010_12_09_other_vhosts_access.log 
-rw-r--r-- 1 root root 14003409 12-10 14:57 2010_12_10_other_vhosts_access.log
# through CustomLog instruction , Every day, 1 Day to generate 1 Two separate log files , Also wrote the timer will 1 Previous log files were cleared , That'll make it a little bit clearer , You can separate everything 1 Day log can be cleared again 1 Set the time before the log through the system ,LogFormat Define the record format of the log  
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined 
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedproxy 
LogFormat "%h %l %u %t \"%r\" %>s %b" common 
LogFormat "%{Referer}i -> %U" referer 
LogFormat "%{User-agent}i" agent

Arbitrary tail1 access_log file, following is a classic access record


218.19.140.242 - - [10/Dec/2010:09:31:17 +0800] "GET /query/trendxml/district/todayreturn/month/2009-12-14/2010-12-09/haizhu_tianhe.xml HTTP/1.1" 200 1933 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)" 

There are 9 terms for 1. Break them 11 to show that:


218.19.140.242 
- 
- 
[10/Dec/2010:09:31:17 +0800] 
"GET /query/trendxml/district/todayreturn/month/2009-12-14/2010-12-09/haizhu_tianhe.xml HTTP/1.1" 
200 
1933 
"-" 
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)"

1)   218.19.140.242 this is a request to the client ip apache server, by default, the first item information just ip address of the remote host, but if we need apache find out the name of the host, can set HostnameLookups to on, but this kind of practice is not recommended, because it greatly slows down the server. The other ip address here is not 1 customer ip address of the host, if The client USES a proxy server, so ip is the address of the proxy server, not the original.
2) - this one is blank, use "-" instead, this position is used to mark the visitor's mark, this information exists by identd's client, unless IdentityCheck is on, otherwise apache will not get the information of this part (ps: not quite understand, basically this one is empty).
The "hyphen" in the output indicates that the requested piece of information is not available. In this case, the information that is not available is the RFC 1413 identity of the client determined by identd on the clients machine. This information is highly unreliable and should almost never be used except on tightly controlled internal networks. Apache httpd will not even attempt to determine this information unless IdentityCheck is set to On.
3) - this item is blank again, but this is the user record the user HTTP authentication, if some websites require the user to carry out the identity wild geese, then this item is to record the user's identity information
4) [10 / Dec / 2010:09:31:17 + 0800] item 4 is recording time request, the format for [day/month/year: hour: minute: second zone], last + 0800 indicates the server time zone to the east eight area
5) "GET /.. haizhu_tianhe.xml HTTP/1.1" the most useful information in the whole record, first, it tells us that the server received an GET request, second, the resource path requested by the client, and third, the protocol used by the client, HTTP/1.1, in the form of "%m %U%q %H", that is, "request method/access path/protocol".
6) 200 this is a status code sent from the server to the client, which tells us whether the client's request was successful, or whether it was redirected, or what kind of error was encountered. This value is 200. Generally speaking, this value starts with 2 to indicate that the request was successful, and starts with 3 to indicate the redirection. The value starts with 4 to indicate that there are some errors on the client side and 5 to indicate that there are some errors on the server side. For details, please refer to HTTP specification (RFC2616 section) 10). [http: / / www. w3. org Protocols/rfc2616 / rfc2616 txt]
7) 1933 indicates how many bytes the server sent to the client. When log analysis is conducted, these bytes can be added up to find out the total amount of data sent by the server at a certain point in time
8) -HTTP Referer: tell the server from which page I have been linked. If there is no value, it may be the reason to open the page directly.
9) "Mozilla / 5.0 (Windows; U; Windows NT 5.1; zh - CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729
2). error_log
error_log is an error log, which records any wrong processing requests. Its location and content are controlled by ErrorLog instructions. What is the most important log file in the server
tail error_log, pick a record at random


[Fri Dec 10 15:03:59 2010] [error] [client 218.19.140.242] File does not exist: /home/htmlfile/tradedata/favicon.ico 

It is also divided into several items:


[Fri Dec 10 15:03:59 2010] 
[error] 
[client 218.19.140.242] 
File does not exist: /home/htmlfile/tradedata/favicon.ico

1) [Fri Dec 10 15:03:59 2010] record the time of the error, note that it is not in the same format as our access_log record above
2) [error] this item is the error level, and the error category is controlled according to the LogLevel instruction. The 404 above belongs to the error level
3) [client 218.19.140.242] record the ip address of the client
4) File does not exist: / home/htmlfile/tradedata/favicon ico this 1 item first for error are described, such as client to access a wrong file does not exist or path, it will give 404 errors

2. Useful log analysis commands and scripts

After understanding the various definitions of a log, here is a script for analyzing a log found on the Internet.

1. View the number of processes for apache
ps -aux | grep httpd | wc -l
2. Analyze the log to see the number of ip connections for the day
cat default-access_log | grep "10/Dec/2010" | awk '{print $2}' | sort | uniq -c | sort -nr
3. See what url was accessed by the specified ip that day
cat default-access_log | grep "10/Dec/2010" | grep "218.19.140.242" | awk '{print $7}' | sort | uniq -c | sort -nr
4. Check the top 10 url visited that day
cat default-access_log | grep "10/Dec/2010" | awk '{print $7}' | sort | uniq -c | sort -nr | head -n 10
5. What did ip do
cat default-access_log | grep 218.19.140.242 | awk '{print $1"\t"$8}' | sort | uniq -c | sort -nr | less
6. See the most visited minutes (find the hot spots)
awk '{print $4}' default-access_log |cut -c 14-18|sort|uniq -c|sort -nr|head


 


Related articles: