HPR2330: Awk Part 7




Hacker Public Radio show

Summary: In this episode, I will (very) briefly go over loops in the Awk programming language. Loops are useful when you want to run the same command(s) on a collection of data or when you just want to repeat the same commands many times. When using loops, a command or group of commands is repeated until a condition (or many) is met. While Loop Here is a silly example of a while loop: #!/bin/awk -f BEGIN { # Print the squares from 1 to 10 the first way i=1; while (i <= 10) { print "The square of ", i, " is ", i*i; i = i+1; } exit; } Our condition is set in the braces after the while statement. We set a variable, i, before entering the loop, then increment i inside of the loop. If you forget to make a way to meet the condition, the while will go on forever. Do While Loop Here is an equally silly example of a do while loop: #!/bin/awk -f BEGIN { i=2; do { print "The square of ", i, " is ", i*i; i = i + 1 } while (i != 2) exit; } Here, the commands in the do code block are executed at the start, then the looping begins. For Loop Another silly example of a for loop: #!/bin/awk -f BEGIN { for (i=1; i <= 10; i++) { print "The square of ", i, " is ", i*i; } exit; } As you can see, we set the variable, set the condition and set the increment method all in the braces after the for statement. For Loop Over Arrays Here is a more useful example of a for loop. Here, we are adding the different values of column 2 into an array/hash-table called a. After processing the file, we print the different values. For file.txt: name color amount apple red 4 banana yellow 6 strawberry red 3 grape purple 10 apple green 8 plum purple 2 kiwi brown 4 potato brown 9 pineapple yellow 5 Using the awk file of: NR != 1 { a[$2]++ } END { for (b in a) { print b } } We get the results of: brown purple red yellow green In another example, we do a similar process. This time, not only do we store all the distinct values of the second column, we perform a sum operation on column 3 for each distinct value of column 2. For file.csv: name,color,amount apple,red,4 banana,yellow,6 strawberry,red,3 grape,purple,10 apple,green,8 plum,purple,2 kiwi,brown,4 potato,brown,9 pineapple,yellow,5 Using the awk file of: BEGIN { FS=","; OFS=","; print "color,sum"; } NR != 1 { a[$2]+=$3; } END { for (b in a) { print b, a[b] } } We get the results of: color,sum brown,13 purple,12 red,7 yellow,11 green,8 As you can see, we are also printing a header column prior to processing the file using the BEGIN code block.