
Hi folks, I have tonnes of CSV files whose sample content is as shown below: Choice|*123456780*|20|1|*YES*|Y|2010-02-26 10:42:51.239772| Choice|*123456781*|10|1|NO|Y|2010-02-26 12:42:51.243603| Choice|*123456782*|20|1|NA|Y|2010-02-26 14:42:51.247464| Choice|*123456783*|10|1|*YES*|Y|2010-02-26 14:42:51.251320| Choice|*123456780*|20|1|OK|Y|2010-02-26 15:42:51.255169| My aim is to extract the contents of the 2nd field alongside the count of similar choice result in 5th field. Desired output: UserId Occurences (*YES*) ====== =============== 123456780 465 123456781 25 I managed to achieve this on a linux box running mysql using the steps below (which can of course be wrapped into a script): 1. grep YES *raw_files* | awk 'BEGIN{FS=OFS="|"}{print $2}' > /tmp/in_file 2. Log in to mysql and perform the steps: 3. mysql> create table nums(num varchar(16)); 4. mysql> load data infile '/tmp/in_file' into table nums(num); 5. mysql> select num,count(*) thecnt from nums group by num; I'm still exploring how to achieve the same using some (simple?) script, possibly not involving the database bit. Any takers? NB: The only bit we know before-hand is desired choice result (YES/NO/NA). Thanks. Tony.