
hey ppl, I have a twitter sub-project I'm working on and wondering if anyone is interesting in collaborating. Here's a brief description: 1) collect data from twitter's public stream 2) extract basic meta-data from the JSON dump from 1) 2) categorize tweets according to language (not necessarily the same as location) 3) categorize tweets according to content (this is the hardest part, and most interesting to me) I need help with 1) and 2). I have a ruby script (go it online) that is collecting data from a bunch of people's tweets. The data I'm collecting right now isn't much, just over 1MB/minute. The script is based on the excellent Ruby twitter library (tweetstream) but I'd like the script to be made much more robust. The output of 1) is a json dump that I also need help with (this can be in any language that has good json parsers). The idea is to extract date/time info, the sender of the tweet, whether it's a retweet, what links are in the tweet, which other twitter username is in the tweet, what hashtags are referenced etc etc. Just very basic meta-data. Presently I have some code that can do 3). But the code gets confused sometimes since each tweet is really small and sometimes includes multiple languages in it. For instance, I'd like to be able to identify a Kenyan (regardless of where they're physically located) solely by their tweets. It's a lot harder than it appears, but I think it's doable. Ultimately I'd like to be able to automatically cluster tweets according to content, but that is a long-term project that I may not be able to achieve. But it's worth a try I think. This will all be open-source, so there is no direct monetary compensation. I'm doing it to learn concepts that I think will be extremely useful. I'd love to answer any other questions you have about this. Interested? Holla. saidi PS Don't sell yourself short and assume you can't do it. You're probably more skilled than you allow yourself to believe, or at the very least you can quickly learn where that is necessary. PPS As you may have guessed, this isn't really about twitter. Twitter just happens to be a great source of huge amounts of freely available data. And it also happens to contain a wide spectrum of people/topics/places, making the info contained within extremely valuable.

@Saidi, nice one.... :-) I've been thinking of a meta-data script on c sharp to strip jpeg info from my camera jpegs so can pick data like iso, aperture, date, time. I'd want to use the strip concept to build say eg online script/template/code library stores. While CMS is still far away, end users select the data picked automatically from the template headers or script/code file definition. As you know, there are many sites that carry affiliate options in a million ways , I'd want to take this and create it into an auto ingest system. Thus replicating an affiliate store with similar effects. Looong way away so unfortunately will not be able to participate in your innovative project but definately others who have RR skills should. Stripping meta data is crazyand interesting stuff... Rgds.

@aki, most definitely. Sometimes it feel like meta-data is the "waste product" that we're ignoring to our peril. With so many devices out there collecting data, this gigantic storehouse probably has amazing secrets about our likes/dislikes/habits/preferences/history etc. Now if only someone can wave a magic wand and create paradise? saidi On Wed, Nov 25, 2009 at 11:50 AM, aki <aki275@googlemail.com> wrote:
@Saidi, nice one.... :-) I've been thinking of a meta-data script on c sharp to strip jpeg info from my camera jpegs so can pick data like iso, aperture, date, time. I'd want to use the strip concept to build say eg online script/template/code library stores. While CMS is still far away, end users select the data picked automatically from the template headers or script/code file definition. As you know, there are many sites that carry affiliate options in a million ways , I'd want to take this and create it into an auto ingest system. Thus replicating an affiliate store with similar effects. Looong way away so unfortunately will not be able to participate in your innovative project but definately others who have RR skills should. Stripping meta data is crazyand interesting stuff...
Rgds.
_______________________________________________ Skunkworks mailing list Skunkworks@lists.my.co.ke http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------ Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke Other lists ------------- Announce: http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks-announce Science: http://lists.my.co.ke/cgi-bin/mailman/listinfo/science kazi: http://lists.my.co.ke/cgi-bin/mailman/admin/kazi/general

On Wed, Nov 25, 2009 at 9:07 PM, saidimu apale <saidimu@gmail.com> wrote:
@aki, most definitely. Sometimes it feel like meta-data is the "waste product" that we're ignoring to our peril. With so many devices out there collecting data, this gigantic storehouse probably has amazing secrets about our likes/dislikes/habits/preferences/history etc. Now if only someone can wave a magic wand and create paradise?
saidi
Saidi, metadata on jpegs is such a neglected issue. Yet, it can be used to certify time and date stamps to authenticate images which can help eg criminal cases, evidence etc. This thing is awesome, but as you probably know the code learning cycle is not a short one. Have not touched c sharp for 2 months now because when it came to file creation and organised storage well a new detour was needed. So now its xml "till files do us apart" while there was another detour "wmi.. to kindgom come". But its well worth the frustrations and patience to learn each process. Mbytemeter still on table before other things... I liked your challenge to Kai, did you know that you can use just the image properties of a cctv camera to create a trigger and capture app? By creating an app to look for voltage changes ( because the luminance/chroma channels carry different voltages scenery can be mapped into actions ) it can create an intelligent counter? They call it motion detection. ( loose term for sampling voltage changes ) Magic wand? I really wish there was one. Each day I read the papers, there is code awaiting to be written by kenyans for commercial purposes. From bar code scanners to automation, almost everything is imported. Lakini, nani ata fanya? Well, I hope my contribution today on other threads was a wake up call for some to realize how quickly things will change if CMS systems came into being or even other changes happen. I'll take a short leave for a while since my contributions lately have been excessive and hope that soon in a few months time, we can start working on code projects. Cheers. :-)
participants (2)
-
aki
-
saidimu apale