Re: [Skunkworks] Some scripting help needed

28 Feb 2010

      Hi Tony,

This is a perfect job for awk - as you kind of already noticed.

The following awk script will do it:
------------------cut-------------------
#!/bin/awk -f

BEGIN {
        FS="|";
        print "UserId\t\tOccurences (YES)";
        print "======\t\t===============";
}

{
        if ($5 == "YES") {
                hits[$2]++;
        }
}

END {
        for (i in hits) {
                print i "\t" hits[i];
        }
}
------------------cut-------------------

Save the script above as "glue.awk" and run it with the following command:
awk -f glue.awk raw_files

Example:
$ awk -f glue.awk raw_files
UserId        Occurences (YES)
======        ===============
123456780    1
123456783    3

Regards
Michael Pedersen
PLUSPEOPLE

Tony Likhanga wrote:
...
Hi folks,
I have tonnes of CSV files whose sample content is as shown below:
Choice|*123456780*|20|1|*YES*|Y|2010-02-26 10:42:51.239772|
Choice|*123456781*|10|1|NO|Y|2010-02-26 12:42:51.243603|
Choice|*123456782*|20|1|NA|Y|2010-02-26 14:42:51.247464|
Choice|*123456783*|10|1|*YES*|Y|2010-02-26 14:42:51.251320|
Choice|*123456780*|20|1|OK|Y|2010-02-26 15:42:51.255169|
My aim is to extract the contents of the 2nd field alongside the count 
of similar choice result in 5th field.
Desired output:
UserId        Occurences (*YES*)
======     ===============
123456780   465
123456781    25
I managed to achieve this on a linux box running mysql using the steps 
below (which can of course be wrapped into a script):
1. grep  YES /raw_files/ | awk 'BEGIN{FS=OFS="|"}{print $2}' > 
/tmp/in_file
2. Log in to mysql and perform the steps:
3. mysql> create table nums(num varchar(16));
4. mysql> load data infile '/tmp/in_file' into table nums(num);
5. mysql> select num,count(*) thecnt from nums group by num;
I'm still exploring how to achieve the same using some (simple?) 
script, possibly not involving the database bit. Any takers?
NB: The only bit we know before-hand is desired choice result (YES/NO/NA).
Thanks.
Tony.