
In the build towards the wiki type thing, I've got 2 files as follows: - File A contains the original data that is being updated say every 10 minutes. Therefore this file contains say the monitored statistics from say e.g 8am to 9am. The next automatic updated data will be available at 910am, which contains the data from 810am to 910am. - File B contains the data from File A but has been transformed to capture the entire data over a 24 hours cycle. *Here's the question:* - In File A and B, there's a clash of repetition data as follows. Between each update on File A between 810-910am, File B has to check for only the updated data between 9am-910am as it already has the other data between 8am-9am. The search criteria of course is the date & time field. *How can I use a search criteria into a string for comparison for File B when the data File A is not static and changes every 10 minutes? * So the data in File A example is as follows in realtime: ....contains the data from 0810am <raw_data>29/02/12-0900</raw_data> <raw_data>29/02/12-0901</raw_data> <raw_data>29/02/12-0902</raw_data> <raw_data>29/02/12-0903</raw_data> <raw_data>29/02/12-0904</raw_data> <raw_data>29/02/12-0905</raw_data> <raw_data>29/02/12-0906</raw_data> <raw_data>29/02/12-0907</raw_data> <raw_data>29/02/12-0908</raw_data> <raw_data>29/02/12-0909</raw_data> <raw_data>29/02/12-0910</raw_data> And in File B example is as follows: .....contains the data from 0800am-0900am but needs only the data between 0900-0910am. Hope I've written the question properly. Any recommendations welcome. :-)

Personally for this I would not use a text file at all. Use an InProc database like SQLLite or if you're writing from multiple threads like from a web app, something thread safe like SQL Compact. Easier to manage, easier to query, easier to report off On Wednesday, February 29, 2012, aki wrote:
In the build towards the wiki type thing, I've got 2 files as follows:
- File A contains the original data that is being updated say every 10 minutes. Therefore this file contains say the monitored statistics from say e.g 8am to 9am. The next automatic updated data will be available at 910am, which contains the data from 810am to 910am.
- File B contains the data from File A but has been transformed to capture the entire data over a 24 hours cycle.
*Here's the question:*
- In File A and B, there's a clash of repetition data as follows. Between each update on File A between 810-910am, File B has to check for only the updated data between 9am-910am as it already has the other data between 8am-9am. The search criteria of course is the date & time field.
*How can I use a search criteria into a string for comparison for File B when the data File A is not static and changes every 10 minutes? *
So the data in File A example is as follows in realtime:
....contains the data from 0810am <raw_data>29/02/12-0900</raw_data> <raw_data>29/02/12-0901</raw_data> <raw_data>29/02/12-0902</raw_data> <raw_data>29/02/12-0903</raw_data> <raw_data>29/02/12-0904</raw_data> <raw_data>29/02/12-0905</raw_data> <raw_data>29/02/12-0906</raw_data> <raw_data>29/02/12-0907</raw_data> <raw_data>29/02/12-0908</raw_data> <raw_data>29/02/12-0909</raw_data> <raw_data>29/02/12-0910</raw_data>
And in File B example is as follows:
.....contains the data from 0800am-0900am but needs only the data between 0900-0910am.
Hope I've written the question properly.
Any recommendations welcome. :-)

@Aki, I also would not recommend any flat files for the storage of data in your system. Consider seriously using relational or even object oriented dbs (noSQL engines) which are probably ideal for you case. If you settle on RDBMs, you can use Triggers [ http://db.apache.org/derby/docs/10.1/ref/rrefsqlj43125.html ] for copying old records to Historical tables for future referrences while INSERTing new ones. This saves you a lot or time and clever code trying to determine if things have changed in your data sets, etc. If you have to use plain text files for storage then you can use revision control systems like git or svn / mercurial / bitkeeper / monotone / perforce etc. On top of that, you might employ systems like SWIG [ http://www.swig.org/ ] that let you hook onto their C / C++ APIs and do things like diffs to do data comparison using your preffered language(s) which is C-sharp. This may or may not be too neat. Martin.

@Rad, @Martin. Thnks for the suggestions and good reading too. Will take a look. :-) Am avoiding to use any DB'es primarily because of security concerns of leaving ports open to run queries, ensuring all the updates/patches are always there, and also having to convert data to e.g. strings and back to avoid sql injections. I'm also visualising a situation where the use of Lite databases will run into limits problems as the data inputs could reach an update every second, thus within 1 year or so, the limit threshold will have been reached. Besides what you have already suggested, any other suggestion that I can work at pure file level would be much appreciated. Cheers. :-)

You are welcome @Aki, What you must appreciate is that records stored in DBMSs are simply placed in plain text files, with a slightly different file organization and indexing structure perhaps managed by MsSQL server / MySQL / Oracle / POSGRES as opposed to the Operating system's File system's component. So there is really not much sense in avoiding databases. The other thing is that could look at revision control systems such as git (preferred) and / or svn (alternate) for managing flat files but with a Wiki type of system like Trac [ http://en.wikipedia.org/wiki/Trac ] or Redmine [ http://en.wikipedia.org/wiki/Redmine ] on top since it is eaier to hook them onto svn / git / mercurial, etc. The trick is getting the end users monitoring and reporting complaints to check-out / in their changes without actually pulling and pushing changes to the server using the repos clients.(Really???). The last thing I would like to mention is that you can avoid SQL injection attacks COMPLETELY by either using prepared statements (like in JAVA) or using stored procedures to query data from your dbs easily. Most languages offer simple data structures like dictionaries / associative arrays to handle records coming from a database. As I said before, avoiding dbms even the lightweight ones for such a system is similar to going down a slippery slope :D. The major problem you will have is when your site starts getting high traffic from different sources and you are forced to do heavy queries against the records it holds. Martin.

@Aki Couple of things. I was very careful to point out that the database is an InProc database. Which essentially means it will be a DLL you include in your app. Therefore issues of ports, updates, etc does not arise. As for converting data for many years I have and will continue to insist that data be stored in its correct type. This costs nothing and has innumerable benefits. As for limits I cannot speak for SQLite but for SQL Compact the max size is 4Gb and it is threadsafe, so lots of frequent writes will not be a problem. Parsing a 4GB text file, even if using a fast XML parser, however will take you a LOONG time. On Wed, Feb 29, 2012 at 11:49 AM, aki <aki275@gmail.com> wrote:
@Rad, @Martin. Thnks for the suggestions and good reading too. Will take a look. :-)
Am avoiding to use any DB'es primarily because of security concerns of leaving ports open to run queries, ensuring all the updates/patches are always there, and also having to convert data to e.g. strings and back to avoid sql injections. I'm also visualising a situation where the use of Lite databases will run into limits problems as the data inputs could reach an update every second, thus within 1 year or so, the limit threshold will have been reached.
Besides what you have already suggested, any other suggestion that I can work at pure file level would be much appreciated.
Cheers. :-)
_______________________________________________ Skunkworks mailing list Skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke

Gentlemen, thank you for all your replies. I'm now sure there is an answer to my query and will research further based on your suggestions. :-) Rgds.

@Aki, What about using the text file to accomplish the task anyway.... On the premise of ease of using text files, I would simply take a snapshot of the first file A, in the specific period, then on the 24hr history file B, I append it with the changing contents of file. A way to implement this is to ensure the automatic updater working on file A puts special characters in the last 10mins, so that I can easily use code to decipher the last 10 inputs. Or it shud make use of a time stamp on all event logs, so that text searching can check for an accurate time stamp like 2/29/2012 12:37 PM : <data> 2/29/2012 12:36 PM : <data> 2/29/2012 12:35 PM : <data> then select all data within time occuring in a 10min inteval..... My thoughts.... On Wed, Feb 29, 2012 at 9:31 AM, Martin Chiteri <martin.chiteri@gmail.com>wrote:
@Aki,
I also would not recommend any flat files for the storage of data in your system. Consider seriously using relational or even object oriented dbs (noSQL engines) which are probably ideal for you case. If you settle on RDBMs, you can use Triggers [ http://db.apache.org/derby/docs/10.1/ref/rrefsqlj43125.html ] for copying old records to Historical tables for future referrences while INSERTing new ones. This saves you a lot or time and clever code trying to determine if things have changed in your data sets, etc.
If you have to use plain text files for storage then you can use revision control systems like git or svn / mercurial / bitkeeper / monotone / perforce etc. On top of that, you might employ systems like SWIG [ http://www.swig.org/ ] that let you hook onto their C / C++ APIs and do things like diffs to do data comparison using your preffered language(s) which is C-sharp. This may or may not be too neat.
Martin. _______________________________________________ Skunkworks mailing list Skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
-- Regards, Greg -------- Life is not a rehearsal, you only live once!

@Aki I would have document A made as an xml document, use simplexml (php) to manipulate it and output the manipulated file in the specified intervals. The outputted file can also be an xml or a text file since you mentioned no databases... Peni mbili zangu.
participants (5)
-
aki
-
Clement Ongera
-
Gregory Okoth
-
Martin Chiteri
-
Rad!