Linux awk Command
AWK is a language used for processing text files and is a powerful text analysis tool.
The name AWK is derived from the initials of the family names of its three founders: Alfred Aho, Peter Weinberger, and Brian Kernighan.
Syntax
awk [options] 'script' var=value file(s)
or
awk [options] -f scriptfile var=value file(s)
Option Descriptions:
- -F fs or --field-separator fs
- -v var=value or --assign var=value
- -f scriptfile or --file scriptfile
- -mf nnn and -mr nnn
- -W compact or --compat, -W traditional or --traditional
- -W copyleft or --copyleft, -W copyright or --copyright
- -W help or --help, -W usage or --usage
- -W lint or --lint
- -W lint-old or --lint-old
- -W posix
- -W re-interval or --re-interval
- -W source program-text or --source program-text
- -W version or --version
Basic Usage
The content of the log.txt file is as follows:
2 this is a test
3 Do you like awk
This's a test
10 There are orange,apple,mongo
Usage One:
awk '{[pattern] action}' {filenames} # The line matching statement awk '' must use single quotes
Example:
# Split each line by spaces or TABs, and output the 1st and 4th items from the text
$ awk '{print $1,$4}' log.txt
---------------------------------------------
2 a
3 like
This's
10 orange,apple,mongo
# Formatted output
$ awk '{printf "%-8s %-10s\n",$1,$4}' log.txt
---------------------------------------------
2 a
3 like
This's
10 orange,apple,mongo
Usage Two:
awk -F #-F is equivalent to the built-in variable FS, specifying the delimiter
Example:
# Split using ","
$ awk -F, '{print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Do you like awk
This's a test
10 There are orange apple
# Or use the built-in variable
$ awk 'BEGIN{FS=","} {print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Do you like awk
This's a test
10 There are orange apple
# Use multiple delimiters. First split by spaces, then split the result by ","
$ awk -F '[ ,]' '{print $1,$2,$5}' log.txt
---------------------------------------------
2 this test
3 Are awk
This's a
10 There apple
Usage Three:
awk -v # Set variable
Example:
$ awk -va=1 '{print $1,$1+a}' log.txt
---------------------------------------------
2 3
3 4
This's 1
10 11
$ awk -va=1 -vb=s '{print $1,$1+a,$1b}' log.txt
---------------------------------------------
2 3 2s
This is an Chinese to English translation, please provide the English translation for this text. Do not provide any explanations or text apart from the translation.
Chinese: 3 4 3s
This's 1 This'ss
10 11 10s
Usage four:
awk -f {awk script} {filename}
Example:
$ awk -f cal.awk log.txt
Operators
Operator | Description | ||
---|---|---|---|
= += -= = /= %= ^= *= | Assignment | ||
?: | C conditional expression | ||
Logical OR | |||
&& | Logical AND | ||
~ and !~ | Match regular expression and do not match regular expression | ||
< <= > >= != == | Relational operators | ||
Space | Concatenation | ||
+ - | Addition, subtraction | ||
* / % | Multiplication, division, and remainder | ||
+ - ! | Unary addition, subtraction, and logical NOT | ||
^ * | Exponentiation | ||
++ -- | Increment or decrement, as a prefix or postfix | ||
$ | Field reference | ||
in | Array member |
Filter rows where the first column is greater than 2
$ awk '$1>2' log.txt # Command
# Output
3 Do you like awk
This's a test
10 There are orange,apple,mongo
Filter rows where the first column is equal to 2
$ awk '$1==2 {print $1,$3}' log.txt # Command
# Output
2 is
Filter rows where the first column is greater than 2 and the second column is 'Are'
$ awk '$1>2 && $2=="Are" {print $1,$2,$3}' log.txt # Command
# Output
3 Are you
Built-in Variables
Variable | Description |
---|---|
$n | The nth field in the current record, separated by FS |
$0 | The complete input record |
ARGC | The number of command-line arguments |
ARGIND | The current file position in the command line (starting from 0) |
ARGV | An array containing the command-line arguments |
CONVFMT | Numeric conversion format (default value is %.6g) |
ERRNO | Description of the last system error |
FIELDWIDTHS | List of field widths (separated by spaces) |
FILENAME | The current filename |
FNR | Line number in each file separately |
FS | Field separator (default is any whitespace) |
IGNORECASE | If true, matching is case-insensitive |
NF | Number of fields in a record |
NR | Number of records read, starting from 1 |
OFMT | Output format for numbers (default value is %.6g) |
OFS | Output field separator, default is the same as the input field separator |
ORS | Output record separator (default is a newline character) |
RLENGTH | Length of the string matched by the match function |
RS | Record separator (default is a newline character) |
RSTART | Starting position of the string matched by the match function |
SUBSEP | Array subscript separator (default value is /034) |
$ awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 5 1
log.txt 2 2 5 2
log.txt 2 3 3 3
log.txt 2 4 4 4
awk
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 ' 1 1
log.txt 2 2 ' 1 2
log.txt 2 3 ' 2 3
log.txt 2 4 ' 1 4
# Output sequence number NR, matching text line number
$ awk '{print NR,FNR,$1,$2,$3}' log.txt
---------------------------------------------
1 1 2 this is
2 2 3 Are you
3 3 This's a test
4 4 10 There are
# Specify output separator
$ awk '{print $1,$2,$5}' OFS=" $ " log.txt
---------------------------------------------
2 $ this $ test
3 $ Are $ awk
This's $ a $
10 $ There $
Using Regular Expressions, String Matching
# Output the second column containing "th", and print the second and fourth columns
$ awk '$2 ~ /th/ {print $2,$4}' log.txt
---------------------------------------------
this a
~ denotes the start of the pattern. // contains the pattern.
# Output lines containing "re"
$ awk '/re/ ' log.txt
---------------------------------------------
3 Do you like awk
10 There are orange,apple,mongo
Ignoring Case
$ awk 'BEGIN{IGNORECASE=1} /this/' log.txt
---------------------------------------------
2 this is a test
This's a test
Negating Patterns
$ awk '$2 !~ /th/ {print $2,$4}' log.txt
---------------------------------------------
Are like
a
There orange,apple,mongo
$ awk '!/th/ {print $2,$4}' log.txt
---------------------------------------------
Are like
a
There orange,apple,mongo
awk Script
Regarding awk scripts, we need to pay attention to two keywords: BEGIN and END.
- BEGIN {这里面放的是执行前的语句}
- END {这里面放的是处理完所有的行后要执行的语句}
- {这里面放的是处理每一行时要执行的语句}
Suppose we have a file (student score table):
$ cat score.txt
Marry 2143 78 84 77
Jack 2321 66 78 45
Tom 2122 48 77 71
Mike 2537 87 97 95
Bob 2415 40 57 62
Our awk script is as follows:
$ cat cal.awk
#!/bin/awk -f
# Before running
BEGIN {
math = 0
english = 0
computer = 0
printf "NAME NO. MATH ENGLISH COMPUTER TOTAL\n"
printf "---------------------------------------------\n"
}
# During running
{
math+=$3
english+=$4
{
computer += $5
printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3, $4, $5, $3 + $4 + $5
}
# After running
END {
printf "---------------------------------------------\n"
printf " TOTAL:%10d %8d %8d \n", math, english, computer
printf "AVERAGE:%10.2f %8.2f %8.2f\n", math / NR, english / NR, computer / NR
}
Let's look at the execution result:
$ awk -f cal.awk score.txt
NAME NO. MATH ENGLISH COMPUTER TOTAL
---------------------------------------------
Marry 2143 78 84 77 239
Jack 2321 66 78 45 189
Tom 2122 48 77 71 196
Mike 2537 87 97 95 279
Bob 2415 40 57 62 159
---------------------------------------------
TOTAL: 319 393 350
AVERAGE: 63.80 78.60 70.00
Some other examples
The hello world program for AWK is:
BEGIN { print "Hello, world!" }
Calculate file sizes:
$ ls -l *.txt | awk '{sum += $5} END {print sum}'
--------------------------------------------------
666581
Find lines longer than 80 characters in a file:
awk 'length > 80' log.txt
Print the multiplication table:
seq 9 | sed 'H;g' | awk -v RS='' '{for(i = 1; i <= NF; i++) printf("%dx%d=%d%s", i, NR, i * NR, i == NR ? "\n" : "\t")}'
More content: