AWK Working Principle
Category Programming Techniques
This article primarily introduces how AWK works.
The AWK workflow can be divided into three parts:
Code segment executed before reading the input file (identified by the BEGIN keyword).
Main loop code segment for processing the input file.
Code segment executed after reading the input file (identified by the END keyword).
Command structure:
awk 'BEGIN{ commands } pattern{ commands } END{ commands }'
The following flowchart describes the AWK workflow:
- Execute the BEGIN block content, which is the content within the curly braces
{}
following the BEGIN keyword.
- Execute the BEGIN block content, which is the content within the curly braces
- After completing the execution of the BEGIN block, start executing the body block.
- Read records separated by the newline character
\n
.
- Read records separated by the newline character
- Split the record into fields based on the specified field separator, fill the fields,
$0
represents all fields (i.e., the entire line),$1
represents the first field,$n
represents the nth field.
- Split the record into fields based on the specified field separator, fill the fields,
- Execute each BODY block sequentially. The awk-commands content will only be executed if the pattern matches the line content.
- Loop through and execute each line until the end of the file, completing the body block execution.
- Start executing the END block, which can output the final results.
BEGIN Block
The syntax for the BEGIN block is as follows:
BEGIN {awk-commands}
The BEGIN block is the code segment executed at the program's startup and is executed only once throughout the process.
Typically, we can initialize some variables in the BEGIN block.
BEGIN is an AWK keyword and must be capitalized.
Note: The BEGIN block is optional; your program may not have a BEGIN block.
BODY Block
The syntax for the BODY block is as follows:
/pattern/ {awk-commands}
The commands in the BODY block are executed once for each input line.
By default, AWK executes commands for every input line. However, we can restrict this to specific patterns.
Note: There is no keyword in the BODY block.
END Block
The syntax for the END block is as follows:
END {awk-commands}
The END block is the code executed at the program's end. END is also an AWK keyword and must be capitalized. Similar to the BEGIN block, the END block is optional.
Example
First, create a file named marks.txt. It includes serial numbers, student names, course names, and scores.
1) 张三 语文 80
2) 李四 数学 90
3) 王五 英语 87
Next, we will use an AWK script to display the content of the file and output the header information.
$ awk 'BEGIN{printf "序号\t名字\t课程\t分数\n"} {print}' marks.txt
Executing the above command, the output is as follows:
序号 名字 课程 分数
1) 张三 语文 80
2) 李四 数学 90
3) 王五 英语 87
When the program starts, AWK outputs the header information in the BEGIN block. In the BODY block, AWK reads each line and outputs the content to the standard output stream until the entire file is read.