Interesting study Sohan. We could all benefit from such a study and I wish there was a way to collect data without privacy concerns.

Unfortunately, it isn't as easy to anonymize data. Often when you think you've done it right someone comes along and fairly easily de-anonymizes it. Check out the links below (they all refer to the same Netflix dataset but with varying degrees of detail).

- When 2+2 Equals A Privacy Question (New York Times): http://www.nytimes.com/2009/10/18/business/18stream.html?_r=4 

- How To Break Anonymity of the Netflix Prize Dataset:
http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

- AOL had a major fiasco a few years back when it also released "anonymized" search records of its users: http://en.wikipedia.org/wiki/AOL_search_data_scandal

- New Study Looks at Re-Identification Risks of Hospital Pharmacy Prescription Records
http://www.cheo.on.ca/english/3100-09-10-14.shtml (The journal article referenced is here: http://www.cheo.on.ca/english/pdf/news_CJHP-El-Emam.pdf)

I don't know you can get meaningful data for your study without running into significant privacy risks. These guys (Artificial Intelligence for Development http://ai-d.org/) seem to have a large dataset of mobile phone data from Kenya. I don't know how they got it and how they deal with privacy issues. They may be of some use to you.

Saidi

On Thu, Nov 19, 2009 at 2:23 PM, rsohan@gmail.com <rsohan@gmail.com> wrote:
On Thu, Nov 19, 2009 at 5:33 PM, Jared Koyier <jaredkoyier@gmail.com> wrote:
I tend to think logs are a little raw and any ISP that keeps such might consider them to be client confidential.

Raw? What do you mean? The logs are the definitive source on what actually happened.

As to the confidentiality argument -- I agree.  However, I point out it is exceedingly trivial to anonymize logs without losing any contextual information; and I would be happy to provide scripts to anyone who doesn't want to do it themselves.


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------
Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke
Other lists
-------------
Announce: http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks-announce
Science:  http://lists.my.co.ke/cgi-bin/mailman/listinfo/science
kazi:     http://lists.my.co.ke/cgi-bin/mailman/admin/kazi/general