Linux awk command usage

  • 2020-04-02 01:48:42
  • OfStack

Here's an example:
In file a, the first column of statistics file a is the average number of floating point rows. It only takes one sentence to implement with awk
$cat a
1.021 33
1 # ll     44
2.53 6
ss       7

Awk 'BEGIN {total = 0; Len = 0} {if($1~/^[0-9]+\.[0-9]*/){total += $1; END len++}} {print total/len} 'a
(analysis: $1 ~ / ^ [0-9] + \. [0-9] * / says $1 with "/ /" the inside of the regular expression match, if match, then the total plus $1, and len since, that is the number 1. "^. [0-9] + \ [0-9] *" is a regular expression, "^ [0-9]," said to begin with, "\." is the meaning of escape, said ". "is the meaning of the decimal point. "[0-9]*" means zero or more digits)

The general syntax of awk is:
Awk [- parametric variable] 'BEGIN{initialize} condition type 1{action 1} condition type 2{action 2}... END{post-processing}'
Where: the statements in BEGIN and END work before and after reading the file (in_file), respectively, and can be understood as initialization and ending.

(1) parameter description:
-f re: allows awk to change its field separator
-v var=$v assign v value to var, if there are multiple variables to assign, then write multiple -v, each variable assigned to a -v
E.g. to print the line between the num line of file a and the num+num1 line,
Awk -v num=$num -v num1=$num1 'NR==num,NR==num+num1{print}' a
-f progfile: allows awk to invoke and execute a progfile program file. Of course, progfile must be an awk syntax compliant program file.

(2) awk built-in variable:
Arg c.       Number of command line arguments
ARGV       Array of command line arguments
ARGIND the ARGV flag of the currently processed file
There are two documents a and b
Awk '{if(ARGIND==1){print "process a file "} if(ARGIND==2){print" process b file "}}' a b
The order of file processing is to scan a file first, then scan b file

NR has read the number of records
FNR     The number of records in the current file
The above example can also be written like this:
Awk 'NR==FNR{print "process file a"} NR > FNR{print "process file b"}' a b
Input files a and b, since the first scan a, there must be NR==FNR when scanning a, and then when scanning b, FNR starts counting from 1, and NR continues to count the number of lines of a, so NR > FNR

Lines 10 to 15 of the document should be shown
Awk 'NR = = 10, 15 NR = = {print}' a

FS input field separator (default :space:), equivalent to the -f option
Awk -F ':' {print}' a       and     Awk 'begin {FS=":"}{print}' a is the same

OFS output field separator (default :space:)
Awk -f ':' 'the BEGIN {OFS = ";" } {print $1, $2, $3} 'b
If cat b is zero
Syntactic sugar for 1:2:3
4:5:6
So let's set the OFS to ";" And then it will output
1; 2; 3
4. 5; 6
Awk USES $1, $2, $3... Represents, $0 represents the entire record (typically a single line)

NF: number of fields in the current record
Awk -F ':' {print NF}' b
3
3
Indicates that each line of b is divided into three fields by the delimiter ":"
NF can be used to control the output of the required number of rows, so that some exception rows can be handled
Awk -F ':' {if (NF == 3)print}' b

RS: enter record separator, default is "\n"
By default, awk treats a row as a record; If RS is set, awk splits records by RS
For example, if file c, cat c is
Hello world. I want to go swimming tomorrow; hiahia
Run awk 'BEGIN{RS = ";" } {print}' c is
Hello world
I want to go swimming tomorrow
hiahia
Rational use of RS and FS allows awk to process more schema documents, such as multiple lines at a time, such as document d, cat d, whose output is
1 2
3, 4, 5

6, 7
8 9 10
11 to 12

Hello!
Awk is also easy to write, with each record separated by a blank line and each field separated by a newline
Awk 'BEGIN{FS = "\n"; RS = ""} {print NF}' d output
2
3
1

ORS: output record delimiter, default to newline character, control the output symbol after each print statement
Awk 'BEGIN{FS = "\n"; RS = ""; ORS = ";" } {print NF}' d
2; 3; 1

(3) awk reads variables in the shell
You can use the -v option to implement functionality
        $b = 1
        $cat f
        apple

$awk -v var=$b '{print var, $var}' f
1 apple
As for whether there is a way to pass the variables in awk to the shell, here's how I understand the question. Shell calls to awk are actually forked out of a child process, and the child process cannot pass variables to the parent unless redirected (including pipes)
A =$(awk '{print $b, '$b'}' f)
Echo $a
Apple 1

(4) output redirection

The output redirection of awk is similar to that of the shell. The target file name for the redirect must be quoted in double quotes.
$awk '$4 > = 70 {print $1, $2 > "Destfile"} 'filename
$awk '$4 > = 70 {print $1, $2 > > "Destfile"} 'filename

(5) shell command is invoked in awk:

1) use pipes
The pipe concept in awk is similar to that of the shell, using the "|" symbol. If a pipe is opened in the awk program, it must be closed before another pipe can be opened. That means you can only open one pipe at a time. Shell commands must be quoted in double quotes. "If you are going to use a file or pipe in an awk program again for reading and writing, you may want to close the program first, because the pipe will remain open until the script runs. Note that once the pipe is opened, it remains open until awk exits. So the statement in the END block is also affected by the pipe. You can close the pipeline on the first line of END."
There are two syntaxes for using pipes in awk:
Awk output | shell input
Shell output | awk input

For awk output | shell input, the shell receives the output of awk and processes it. It should be noted that the output of awk is cached in the pipe first, and then the shell command is called after the output is finished. The shell command is only processed once, and the processing time is "when awk program ends, or when the pipe is closed (the pipe needs to be explicitly closed)".
$awk '/ west / {count++} {printf "% s % s \ \ t - 15 t % s \ n", $3, $4, $1 | "sort + 1}" END {close "sort + 1"; Printf "The number of sales pers in The western"; Printf "region is "count "."}' datafile (explanation: /west/{count++} means match with" wes "t, if it matches, count will be added)
The printf function is used to format the output and send it to the pipe. When all the output is collected, it is sent to the sort command together. You must close the pipe with exactly the same command as when you opened it (sort +1), or the statement in the END block will be sorted along with the previous output. The sort command here is executed only once.

Awk input in shell output | awk input can only be a getline function. The result of shell execution is cached in pipe and passed to awk for processing, and awk's getline command may be invoked multiple times if there are multiple lines of data.
$awk 'BEGIN{while(("ls" | getline d) > 0) print d} 'f


Related articles: