------------------------------------------------------------------------ Summary of awk -- SRK SRK Start: 3/Oct/2015 V1: ------------------------------------------------------------------------ Convention: $ bash symbol % .. comments by SRK # .. comment line used by awk [a|b|c] .. only one or zero option address .. line number pattern .. regular expression or relational expression A. Regular expressions are enclosed by "/regexp/" or "!/regexp/" /^$/ .. identify all empty lines /^ *$/ .. identify all blank lines !/^$/ ... identify all non-empty lines B. A basic relational expression is string ~ /regexp/ string !~ /regexp/ value operator expression ($1>10 or NR>=1 etc) C. Relational expressions can be combined R1 operator R2 where operator is && and || ------------------------------------------------------------------------ Records ------------------------------------------------------------------------ $0 .. the entire record $1, $2, .. field 1, field 2 and so on ------------------------------------------------------------------------ Built-in Variables (and so do not require preceeding "$") ------------------------------------------------------------------------ NR ... number of records (starts at 1 and incremented for every line) NF ... number of fields FS .. field separator for input (blank or blanks, including tabs) RS .. record separator for input (\n) OFS .. field separator for output (" ") ORS .. record separator for output (\n) OFMT .. format when printing string as number (def: %.6g) CONVFMT .. format when printing number (def: %.6g) FIELDWIDTHS .. IGNORECASE .. SUBSEP .. ------------------------------------------------------------------------ Other rules ------------------------------------------------------------------------ all variables are born with zero value and empty string ++n, n++ .. pre- and post-increment n=n*2 -> n=*2 ------------------------------------------------------------------------ Structure of an awk program ------------------------------------------------------------------------ %executing awk with a script file (doit.awk) %there is no official file extension for script file $awk -f a.awk [inputfile1 inputfile2 ....] a.awk BEGIN{ #BEGIN section statements } #main section [[Regexp|Relational],[Regexp|Relational]]{statements} END{ #END section statements } Rule 1: statements in the BEGIN block are executed before control passes to the main block at which point the input file is opened. Rule 2a: statements in the MAIN block are applied to line of each input file allowed (filtered) by the [address|regexp|relational] Rule 2b: An input line encompasses all characters up to, but not including, \n Rule 3: if [address|regexp|relational] is missing then statements apply to every line of the input file(s). Rule 4: if statements are missing in the MAIN block then the default statement is to print the line Rule 5: after the last line has been read the statements in the END block are executed. ------------------------------------------------------------------------ EXAMPLE I: Printing ------------------------------------------------------------------------ BEGIN{ print "hello world" } $awk -f a.awk hello world ------------------------------------------------------------------------ EXAMPLE II: String concatenation ------------------------------------------------------------------------ BEGIN{ x=0; print "hello world" " " x #note: explicit blank character #strings are concatenated, by default print ("hello world" ":" x) #parenthesis explicitly forces concatenation #when not sure use ( )to concatenate $awk -f a.awk hello world 0 hello world:0 ------------------------------------------------------------------------ EXAMPLE III: Arithmetic Operations ------------------------------------------------------------------------ #In addition to the usual unary and binary operators, #in decreasing precedence, (unary - +; binary ^ * / %; binary + -) #awk has the assign operators and pre- and post-increment operators #as in C. BEGIN{ x=0; y=2; z=3; w=5; x+=1; y^=2; z-=1; a=y*z; b=w%2; print x " " y " " z " " a " " b } $awk -f a.awk 1 4 2 8 1 ------------------------------------------------------------------------ EXAMPLE IVa: Counting number of lines ------------------------------------------------------------------------ $cat Kitty.txt 1 Hello, 2 My 3 name 4 is 5 Kitty $ cat a.awk BEGIN{ n=0; } {++n} END{ print "lines= " n } $awk -f a.awk Kitty.txt lines= 5 ------------------------------------------------------------------------ EXAMPLE IVb: Counting number of lines ------------------------------------------------------------------------ $cat a.awk {n++} #in awk the initial value of all variables is zero END{ print "lines= " n } $awk -f a.awk Kitty.txt lines= 5 This can be simplified to $ awk '{n++}END{print "lines= " n}' Kitty.txt lines= 5 ------------------------------------------------------------------------ EXAMPLE V: Counting number of lines & characters (excluding newline char) ------------------------------------------------------------------------ a.awk {n=n+1; c=c+length($0)} # $0 refers to the entire input line END{ # length(str) returns length of string print "lines= " n " " "char(without \\n)= " c #note the use of "\" to escape another "\" } $awk -f a.awk Kitty.txt lines= 5 char(without \n)= 29 ------------------------------------------------------------------------ EXAMPLE VI: Print first few lines ------------------------------------------------------------------------ a.awk BEGIN{ n=3 #number of lines } NR<=n #No commands and so default is print line $awk -f a.awk Kitty.txt 1 Hello, 2 my 3 name ------------------------------------------------------------------------ EXAMPLE VII: Ignore all blank lines ------------------------------------------------------------------------ a.awk !/^$/ ------------------------------------------------------------------------ EXAMPLE VIII: Associative arrays ------------------------------------------------------------------------ Associative arrays are of the form a["key"]="value" In an ordinary array, key is numerical and is either 1,2,.. or 0,1,2.. It is easy to compute the address of the memory cell, given the index. In contrast, both the index and the value for an associative array are strings (even if they look like numbers!). The internal ordering of the arrays is OS dependent. The only operations allowed in associative arrays are: adding a new element, deleting an exisiting element, modifying an exisiting element, looking up the value for a given key. Other names for associative arrays: map, symbol table, dictionary. Implementing the operations for associative arrays is apparenly a classic problem in computer science. You may across terms such "hash table" and "search tree". Associative arrays are very useful in cataloging records - finding which are repeated and which are unique. $ cat rabbits parvi kuro lakshmi sarsi malli kuro sarsi kuro $ awk '{a[$1]++} \ END{for (key in a){print key,a[key]}}' rabbits parvi 1 malli 1 sarsi 2 lakshmi 1 kuro 3 ------------------------------------------------------------------------ Example: ------------------------------------------------------------------------ #print the number of bytes allocated to each file $ ls -l -rw-r--r-- 1 srk staff 745 Nov 12 13:30 1980AJ.....85.1413K -rw-r--r-- 1 srk staff 1945 Nov 12 13:21 1981ApJ...247L..73H -rw-r--r-- 1 srk staff 531 Nov 12 13:31 1982A%26A...112..120P -rw-r--r-- 1 srk staff 5585 Nov 12 13:13 1982ApJ...259L..63K -rw-r--r-- 1 srk staff 22145 Nov 12 13:09 1982Natur.300..615B -rw-r--r-- 1 srk staff 1105 Nov 12 13:27 1983ApJ...273L..75H -rw-r--r-- 1 srk staff 3105 Nov 12 13:17 1983ApJS...53..591D -rw-r--r-- 1 srk staff 2025 Nov 12 13:21 1983Natur.301..314B -rw-r--r-- 1 srk staff 1505 Nov 12 13:23 1983Sci...220.1233B $ ls -l | awk '{print $5}' #number of bytes in file #Compute the sum of all file sizes $ ls -l | awk '{sum=+$5} END{print sum}' #Determine largest file $ ls -l | awk '{if ($5>max) max=$5} END{print max}' # pedestrian $ ls -l | awk '$5>max{max=$5} END{print max}' # elegant ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------