❮ Linux Comm Chkconfig Linux Comm Ping ❯

Linux awk Command

AWK is a language used for processing text files and is a powerful text analysis tool.

The name AWK is derived from the initials of the family names of its three founders: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Syntax

awk [options] 'script' var=value file(s)
or
awk [options] -f scriptfile var=value file(s)

Option Descriptions:

-F fs or --field-separator fs
-v var=value or --assign var=value
-f scriptfile or --file scriptfile
-mf nnn and -mr nnn
-W compact or --compat, -W traditional or --traditional
-W copyleft or --copyleft, -W copyright or --copyright
-W help or --help, -W usage or --usage
-W lint or --lint
-W lint-old or --lint-old
-W posix
-W re-interval or --re-interval
-W source program-text or --source program-text
-W version or --version

Basic Usage

The content of the log.txt file is as follows:

2 this is a test
3 Do you like awk
This's a test
10 There are orange,apple,mongo

Usage One:

awk '{[pattern] action}' {filenames}   # The line matching statement awk '' must use single quotes

Example:

# Split each line by spaces or TABs, and output the 1st and 4th items from the text
$ awk '{print $1,$4}' log.txt
---------------------------------------------
2 a
3 like
This's
10 orange,apple,mongo
# Formatted output
$ awk '{printf "%-8s %-10s\n",$1,$4}' log.txt
---------------------------------------------
2        a
3        like
This's
10       orange,apple,mongo

Usage Two:

awk -F  #-F is equivalent to the built-in variable FS, specifying the delimiter

Example:

# Split using ","
$ awk -F, '{print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Do you like awk
This's a test
10 There are orange apple
# Or use the built-in variable
$ awk 'BEGIN{FS=","} {print $1,$2}' log.txt
---------------------------------------------
2 this is a test
3 Do you like awk
This's a test
10 There are orange apple
# Use multiple delimiters. First split by spaces, then split the result by ","
$ awk -F '[ ,]' '{print $1,$2,$5}' log.txt
---------------------------------------------
2 this test
3 Are awk
This's a
10 There apple

Usage Three:

awk -v  # Set variable

Example:

$ awk -va=1 '{print $1,$1+a}' log.txt
---------------------------------------------
2 3
3 4
This's 1
10 11
$ awk -va=1 -vb=s '{print $1,$1+a,$1b}' log.txt
---------------------------------------------
2 3 2s
This is an Chinese to English translation, please provide the English translation for this text. Do not provide any explanations or text apart from the translation.
Chinese:  3 4 3s
 This's 1 This'ss
 10 11 10s

Usage four:

awk -f {awk script} {filename}

Example:

$ awk -f cal.awk log.txt

Operators

Operator	Description
= += -= = /= %= ^= *=	Assignment
?:	C conditional expression
		Logical OR
&&	Logical AND
~ and !~	Match regular expression and do not match regular expression
< <= > >= != ==	Relational operators
Space	Concatenation
+ -	Addition, subtraction
* / %	Multiplication, division, and remainder
+ - !	Unary addition, subtraction, and logical NOT
^ *	Exponentiation
++ --	Increment or decrement, as a prefix or postfix
$	Field reference
in	Array member

Filter rows where the first column is greater than 2

$ awk '$1>2' log.txt    # Command
# Output
3 Do you like awk
This's a test
10 There are orange,apple,mongo

Filter rows where the first column is equal to 2

$ awk '$1==2 {print $1,$3}' log.txt    # Command
# Output
2 is

Filter rows where the first column is greater than 2 and the second column is 'Are'

$ awk '$1>2 && $2=="Are" {print $1,$2,$3}' log.txt    # Command
# Output
3 Are you

Built-in Variables

Variable	Description
$n	The nth field in the current record, separated by FS
$0	The complete input record
ARGC	The number of command-line arguments
ARGIND	The current file position in the command line (starting from 0)
ARGV	An array containing the command-line arguments
CONVFMT	Numeric conversion format (default value is %.6g)
ERRNO	Description of the last system error
FIELDWIDTHS	List of field widths (separated by spaces)
FILENAME	The current filename
FNR	Line number in each file separately
FS	Field separator (default is any whitespace)
IGNORECASE	If true, matching is case-insensitive
NF	Number of fields in a record
NR	Number of records read, starting from 1
OFMT	Output format for numbers (default value is %.6g)
OFS	Output field separator, default is the same as the input field separator
ORS	Output record separator (default is a newline character)
RLENGTH	Length of the string matched by the match function
RS	Record separator (default is a newline character)
RSTART	Starting position of the string matched by the match function
SUBSEP	Array subscript separator (default value is /034)

$ awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}'  log.txt
FILENAME ARGC  FNR   FS   NF   NR  OFS  ORS   RS
---------------------------------------------
log.txt    2    1         5    1
log.txt    2    2         5    2
log.txt    2    3         3    3
log.txt    2    4         4    4

awk
FILENAME ARGC  FNR   FS   NF   NR  OFS  ORS   RS
---------------------------------------------
log.txt    2    1    '    1    1
log.txt    2    2    '    1    2
log.txt    2    3    '    2    3
log.txt    2    4    '    1    4
# Output sequence number NR, matching text line number
$ awk '{print NR,FNR,$1,$2,$3}' log.txt
---------------------------------------------
1 1 2 this is
2 2 3 Are you
3 3 This's a test
4 4 10 There are
# Specify output separator
$ awk '{print $1,$2,$5}' OFS=" $ " log.txt
---------------------------------------------
2 $ this $ test
3 $ Are $ awk
This's $ a $
10 $ There $

Using Regular Expressions, String Matching

# Output the second column containing "th", and print the second and fourth columns
$ awk '$2 ~ /th/ {print $2,$4}' log.txt
---------------------------------------------
this a

~ denotes the start of the pattern. // contains the pattern.

# Output lines containing "re"
$ awk '/re/ ' log.txt
---------------------------------------------
3 Do you like awk
10 There are orange,apple,mongo

Ignoring Case

$ awk 'BEGIN{IGNORECASE=1} /this/' log.txt
---------------------------------------------
2 this is a test
This's a test

Negating Patterns

$ awk '$2 !~ /th/ {print $2,$4}' log.txt
---------------------------------------------
Are like
a
There orange,apple,mongo
$ awk '!/th/ {print $2,$4}' log.txt
---------------------------------------------
Are like
a
There orange,apple,mongo

awk Script

Regarding awk scripts, we need to pay attention to two keywords: BEGIN and END.

BEGIN {这里面放的是执行前的语句}
END {这里面放的是处理完所有的行后要执行的语句}
{这里面放的是处理每一行时要执行的语句}

Suppose we have a file (student score table):

$ cat score.txt
Marry   2143 78 84 77
Jack    2321 66 78 45
Tom     2122 48 77 71
Mike    2537 87 97 95
Bob     2415 40 57 62

Our awk script is as follows:

$ cat cal.awk
#!/bin/awk -f
# Before running
BEGIN {
    math = 0
    english = 0
    computer = 0

    printf "NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL\n"
    printf "---------------------------------------------\n"
}
# During running
{
    math+=$3
    english+=$4

{
    computer += $5
    printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3, $4, $5, $3 + $4 + $5
}
# After running
END {
    printf "---------------------------------------------\n"
    printf "  TOTAL:%10d %8d %8d \n", math, english, computer
    printf "AVERAGE:%10.2f %8.2f %8.2f\n", math / NR, english / NR, computer / NR
}

Let's look at the execution result:

$ awk -f cal.awk score.txt
NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL
---------------------------------------------
Marry  2143     78       84       77      239
Jack   2321     66       78       45      189
Tom    2122     48       77       71      196
Mike   2537     87       97       95      279
Bob    2415     40       57       62      159
---------------------------------------------
  TOTAL:       319      393      350
AVERAGE:     63.80    78.60    70.00

Some other examples

The hello world program for AWK is:

BEGIN { print "Hello, world!" }

Calculate file sizes:

$ ls -l *.txt | awk '{sum += $5} END {print sum}'
--------------------------------------------------
666581

Find lines longer than 80 characters in a file:

awk 'length > 80' log.txt

Print the multiplication table:

seq 9 | sed 'H;g' | awk -v RS='' '{for(i = 1; i <= NF; i++) printf("%dx%d=%d%s", i, NR, i * NR, i == NR ? "\n" : "\t")}'