Linux under apache log analysis and status view method

  • 2020-05-06 12:09:50
  • OfStack

assumes that the apache log format is:
118.78.199.98 � - [09 / Jan / 2010:00:59:59 + 0800] "GET Public/Css/index css HTTP / 1.1" 304 � "http: / / www. a. cn/common/index php" "Mozilla / 4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;

GTB6. 3)"

problem 1: find the 10 most visited IP in apachelog.
awk '{print $1}' apache_log |sort |uniq -c|sort -nr|head -n 10

awk first grabs the IP in each log. If the log format has been customized, -F can define the delimiter and print specify the column.
sort makes the first sort, so that the same records are arranged together;
Es47en-c merges duplicate rows and records the number of repeats.
head to screen the top ten;
Es52en-nr is sorted by number in reverse order.

The command I refer to is :
displays the 10 most commonly used commands
sed -e "s/| //n/g" ~/.bash_history | cut -d ' ' -f 1 | sort | uniq -c | sort -nr | head

problem 2: find the most accessed minutes in the apache log.
awk '{print   $4}' access_log |cut -c 14-18|sort|uniq -c|sort -nr|head
The fourth column, separated by Spaces, is [09/Jan/2010:00:59:59;
Es75en-c extracts 14 to 18 characters
The rest is similar to problem number one.

problem 3: find the most visited page in the apache log:
awk '{print $11}' apache_log |sed 's/^.*cn/(.*/)/"//1/g'|sort |uniq -c|sort -rn|head

Similar problems 1 and 2, is the only special use http sed replace feature will ": / / www a. cn/common/index php" to replace the brackets content: "http: / / www a. cn (/ common index. php)"

question 4: find the most accessed (most heavily loaded) time periods (in minutes) in the apache log, and then see which of those times IP accessed the most?
1, see apache process :
ps aux | grep httpd | grep -v grep | wc -l

2, view the tcp connection on port 80 :
netstat -tan | grep "ESTABLISHED" | grep ":80" | wc -l

3. Check the number of ip connections through the log, and filter the repeat :
cat access_log | grep "19/May/2011" | awk '{print $2}' | sort | uniq -c | sort -nr

4. What did ip, which had the highest number of connections, do that day cat access_log | grep "19/May/2011:00" | grep "61.135.166.230" | awk '{print $8}' | sort | uniq -c | sort -nr | head -n 10

5. url:
in the top 10 pages visited that day cat access_log | grep "19/May/2010:00" | awk '{print $8}' | sort | uniq -c | sort -nr | head -n 10

6, sniff port 80 access with tcpdump to see who is highest
tcpdump -i eth0 -tnn dst port 80 -c 1000 | awk -F"." '{print $1"."$2"."$3"."$4}' | sort | uniq -c | sort -nr

Then check the log to see what the ip is doing :
cat access_log | grep 220.181.38.183| awk '{print $1"/t"$8}' | sort | uniq -c | sort -nr | less

7. Check the number of ip connections in a certain period :
grep "2006:0[7-8]" www20110519.log | awk '{print $2}' | sort | uniq -c| sort -nr | wc -l

8. The 20 most frequently joined ip addresses in the current WEB server :
netstat -ntu |awk '{print $5}' |sort | uniq -c| sort -n -r | head -n 20

9, view the top 10 most accessed IP
in the log cat access_log |cut -d ' ' -f 1 |sort |uniq -c | sort -nr | awk '{print $0 }' | head -n 10 |less

10. View IP
that appears more than 100 times in the log cat access_log |cut -d ' ' -f 1 |sort |uniq -c | awk '{if ($1 > 100) print $0}' | sort-nr |less

11. Check out the most recently visited file,
cat access_log |tail -10000|awk '{print $7}'|sort|uniq -c|sort -nr|less

12. View the pages
visited more than 100 times in the log cat access_log | cut -d ' ' -f 7 | sort |uniq -c | awk '{if ($1 > 100) print $0}' | less

13, lists files
that took more than 30 seconds to transfer cat access_log|awk '($NF > 30){print $7}'|sort -n|uniq -c|sort -nr|head -20

14. List the most time-consuming pages (those over 60 seconds) and the number of times the page occurred
cat access_log |awk '($NF > 60 && $7~//.php/){print $7}'|sort -n|uniq -c|sort -nr|head -100


Related articles: