That's one way

I'd do it this way

1. install mysql server and configure innodb_buffer_pool_size to be 90% of system memory (i might use someone's server for this) :)
2. create a table with four fields (id - autoinc, unclean - varchar(256), trndate (datetime), seq nos (innodb table to make use of 1 above)
3. index id primary in one index, trndate in another
4. import into the database, throw everything into the unclean field
5. wait :)
6. run  query to split for example update mytable set trndate = left(unclean,8), seqno = substr(unclean,9,255) - i've noticed the format is yyyymmdd - makes it easier, you'd have to confirm my statement since i just wrote it from memory
7. wait  again
8. then do selects and export to text files - the index on the trndate should make it super fast

don't forget to put the indexes otherwise it will take forever to do the queries

just a suggestion... and in the end you'll have a nice neat table in case you want to do more perations on it. You can even back it up and zip it for storage!


On 13 June 2013 12:54, Bwana Lawi <mail2lawi@gmail.com> wrote:
Hi guys,

I have a file with over 30 million lines (Txn IDs) over a period of 11 years.

I want to delete from that file (it's a flat file, not a db table) lines that dont begin with certain characters.

Or alternatively, create another file for lines that begin with certain characters.

The format of the ID is DATE:<SOME.SEQ.NOs>(Of course, there are multiple txns per day)

So for instance it will have 20080101XXXXXXXX - For transactions done on 1st Jan 2008, 20130613XXXXXX for today's txn, etc

My requirement is to get a list of transactions done say from Jun 2010 to Jun 2011. Therefore, I should somehow either:
a) Create a new file by piping output of grep ^2010 and grep ^2011 (Challenge is how to get only for 2010[06,07,08.09,10,11 and 12] and not entire 2010. Same for 2011)

b) Delete the lines that dont match a) above. I dont want to go this route.

Tools available:
Ubuntu 12.04
HP-UX 11.31

PS: I don't know if the subject matches my requirement but ...






_______________________________________________
skunkworks mailing list
skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke