AWK Linux Tutorial


Awk Linux Tutorial

Awk is a programming language which involves manipulation of data is a strict structured format. Awk stands proudly representing the names of its authors who dealt a great time with it. The authors were “Aho, Weinberger, and Kernighan”. Since Awk handles data very efficiently and sequentially hence it is used for basic pattern searching in a permutation of records and data. It can even match between various files and return the result, whether they required pattern matches or not.
Awk always prefers text files and not random files because random files consists of certain binary numbers which tends to the miscalculation of the records and field.
So either Awk will read from a file or one can provide the data in the command line itself. The code on which the Awk will work can be entered directly in the command line interface or it can be stored in a file using editor software’s like VI or VIM.
One must always remember that Awk commands always work on records and fields. It’s way of coding  basically resembles to the language C.
Now we will discuss the working way of Awk command inside Linux shell:
  •  Awk reads 1st line. Since it works sequentially, thus it will always read a line at a time.
  • Then it scans the required pattern. If the pattern search results true, then a specific set of instructions gets executed and increment of line takes place else if no match is found, then also it may or may not execute a particular set of commands. It completely depends on the user whether to execute any set in any one condition. Both cannot be run simultaneously. It’s similar to the if-else syntax of the language C.
  • Delimiter must be placed after a statement in the action clause.


Syntax of Awk :

Awk '(pattern 1){Action performed}
(Pattern 2){Action performed}' <File on which action has to be performed>

Each record is separated within segments represented as $n.

For example, if we have a sentence
Thomas Alva Edison was a great innovator of his time.
Here $1 = Thomas $2 = Alva $3 = Edison $4 = was $5 = a $6 = great $7 = innovator $8= of $9 = his $10 = time

Phew, it took ages to type all those dollars.
One important thing here is that Awk also has a $0 too, which represent the whole record i.e. all $1 to $10 combined.


Example 1: Awk simple search

1 Robert CEO Stark
2 Pepper Secretary Stark
3 Shashi CEO thessuman
4 Ambika CEO D&Dmotors
5 Aditi Secretary D&Dmotors

The above 5 commands were given in VI editor and saved as “test”.
Now we will the following commands

Awk ‘/CEO/’ test
Awk ‘/Stark/’ test

Awk ‘/D&Dmotors/’ test

Below image is the output of all above commands


The command Awk '/{print;}/ <filename> will print all the records.
The command Awk '/{print $n,......,}/ <filename> will print the dollar value of the record.

Awk in Linux has two clauses that function as beginning and ending execution. The first one is the BEGIN clause and the second one is the END clause. The BEGIN clause gets executed only once at the beginning whereas the END clause gets executed only at the end of when all records have been finished. 

Synatx for BEGIN in Awk

BEGIN { actions }
{ commands to be executed in each line }
END  { actions }

Awk in Linux also has conditions and statement execution. Let us write a simple program to count number of CEO's and number of employees in "D&Dmotors".

A simple example for BEGIN and END
Awk 'BEGIN {ce=0;d=0}
$3 ~ /CEO/ {ce++;}
$4 ~ /D&Dmotors/ {d++;}
END { print "No. of CEOs are ",ce,"\t","No. of employees in D&D are ",d;}' test.txt



For AWK built-in commands Click here 

No comments:

Post a Comment