Linux Basics – awk
It’s not a noise, it’s a command!
The awk command is one of those strange commands, much like the extinct bird with the same name. You either love it or you hate it. Either way, it is an impressive command and what many do not realize is that it is also a complete, mini programming language designed for processing text.
The strange name comes from the first letters of the last names of the authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The awk command is included by default in most modern versions of Linux and is a powerful tool when it comes to extracting text fields from sources such as log files. Used correctly, it can also save many unneeded iterations when processing text by using the awk built-in functions and loops.
The awk program flow:
Read
awk reads a line from the input stream (file, pipe, or stdin) and stores it in memory.
Execute
awk commands are applied sequentially on the input. By default, awk executes commands on every line. We can restrict this by providing patterns.
Repeat
This process repeats until the file reaches its end.
The basic syntax of awk is:
awk '/search_pattern/ { action_to_take_on_match; another_action; }' file_to_parse
Let us work with our sample.txt file again which contains the following lines:
~$ cat sample.txt The quick brown fox jumped over the lazy dog .
In its simplest form, it behaves much like grep except with a slightly different syntax. The awk command also assumes each space is a column separator.
Thus:
~$ awk '/the/' sample.txt the lazy # THUS # COL 1 | COL 2 the | lazy # PRINT COLUMN 2 ~$ awk '/the/ {print $2}' sample.txt lazy # PRINT ALL LINES WITH 2 OR MORE COLUMNS ~$ awk '$2' sample.txt The quick jumps over the lazy # NOW WE ADD A FULL LINE "The quick brown fox jumps over the lazy dog." TO THE END OF sample.txt ~$ awk '$2' sample.txt The quick jumps over the lazy The quick brown fox jumps over the lazy dog. # MATCH ONLY LINES WHERE THE 2ND COLUMN STARTS WITH THE LETTER q ~$ awk '$2 ~ /^q/' sample.txt The quick The quick brown fox jumps over the lazy dog.
Now we will use awk to print only the 2nd column of each line. This results in blank lines where lines have only a single column:
awk '{print $2}' sample.txt quick over lazy quick
To remove the blank lines, we can filter them out with grep or sed, but let us use one of the previous awk examples for this:
~$ awk '$2' sample.txt |awk '{print $2}' quick over lazy quick
As you can see, there is much you can do with the awk command. The full scope of this command will not be discussed here as there are many very good online guides to help you learn this text processing language. It is a valuable tool in any Linux Administrator or Engineers toolbox and makes life much easier when it comes to large-scale automated processing of masses of text to extract that bit of data you need in exactly the correct place or format.
Happy Hosting!