Re: [Skunkworks] List of words or phrases in Swahili plus their English equivalents

Hello Harry, Thank you for your pointer. We have decided to take a slightly different approach. What we will do is to pull text from different entries on Wikipedia in both Swahili and English. After that we will make computer(s) go through them in the hope that they will start to recognize patterns in the way words are used, how often, where and check maybe the rules of spelling and the general structure of sentences (grammar) in both languages.
From what I have been informed, this is what usually happens with data from the CMS experiment <http://home.web.cern.ch/about/experiments/cms>. You can get a rough idea of the ``type" of event (proton-proton collision) even if you don't know exactly what happened to the quarks when they collided.
Martin. On Thu, May 30, 2013 at 3:38 AM, Henry Addo <addhen@gmail.com> wrote:
You might want to check out kamusi.org if you haven't
Henry! -- Be the change you want to see ~ Mahatma Gandhi. So stop complaining :-). Website: http://www.findreels.com
On Wed, May 29, 2013 at 7:52 PM, Martin Chiteri <martin.chiteri@gmail.com>wrote:
Thank you Christian and Kala,
I had no idea that such resources were in existence. I will take some more time and have a look at them.
Thank you gentlemen.
Martin.
On Wed, May 29, 2013 at 11:59 AM, Emmanuel Kala <emkala@googlemail.com>wrote:
The folks that publish the Swahili-English dictionary (I can't remember their name) ought to have a corpus in digital form that's lying somewhere.
Also, if you're into computational linguistics, Google released their N-gram corpus a couple of years ago and it's available for download: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you....
On May 29, 2013, at 11:27 AM, Christian Ledermann < christian.ledermann@gmail.com> wrote:
and https://wiki.ushahidi.com/display/WIKI/Localization+and+Translation
On Wed, May 29, 2013 at 11:24 AM, Christian Ledermann <christian.ledermann@gmail.com> wrote:
http://blog.ushahidi.com/2013/05/17/in-your-own-words/
On Wed, May 29, 2013 at 10:53 AM, Martin Chiteri <martin.chiteri@gmail.com> wrote:
Hello Skunks,
I would like to request for a substantial amount of words, phrases or sentences written in Swahili together with their equivalents in the English language. I know this is a tall order but I guess that some people on these lists might have samples, perhaps from projects done for translations (Bible / Website / stand-alone systems internationalization efforts and so forth).
The aim is to get enough material that will be used in a machine learning based project in an upcoming Hackathon next month. We would like to train machines to recognize phrases in the different languages then attempt to use similar techniques to make them detect various sub-atomic particle types from another set of Physics data.
Any help will be highly appreciated. Thank you.
Martin.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- Best Regards,
Christian Ledermann
Nairobi - Kenya Mobile : +254 702978914
<*)))>{
If you save the living environment, the biodiversity that we have left, you will also automatically save the physical environment, too. But If you only save the physical environment, you will ultimately lose both.
1) Don’t drive species to extinction
2) Don’t destroy a habitat that species rely on.
3) Don’t change the climate in ways that will result in the above.
}<(((*>
-- Best Regards,
Christian Ledermann
Nairobi - Kenya Mobile : +254 702978914
<*)))>{
If you save the living environment, the biodiversity that we have left, you will also automatically save the physical environment, too. But If you only save the physical environment, you will ultimately lose both.
1) Don’t drive species to extinction
2) Don’t destroy a habitat that species rely on.
3) Don’t change the climate in ways that will result in the above.
}<(((*>
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

``Hello Harry .... " , "Hello Henry ...... " ..... Close enough ...... 8-) Martin. On Thu, May 30, 2013 at 4:06 AM, Martin Chiteri <martin.chiteri@gmail.com>wrote:
Hello Harry,
Thank you for your pointer.
We have decided to take a slightly different approach. What we will do is to pull text from different entries on Wikipedia in both Swahili and English. After that we will make computer(s) go through them in the hope that they will start to recognize patterns in the way words are used, how often, where and check maybe the rules of spelling and the general structure of sentences (grammar) in both languages.
From what I have been informed, this is what usually happens with data from the CMS experiment <http://home.web.cern.ch/about/experiments/cms>. You can get a rough idea of the ``type" of event (proton-proton collision) even if you don't know exactly what happened to the quarks when they collided.
Martin.
On Thu, May 30, 2013 at 3:38 AM, Henry Addo <addhen@gmail.com> wrote:
You might want to check out kamusi.org if you haven't
Henry! -- Be the change you want to see ~ Mahatma Gandhi. So stop complaining :-). Website: http://www.findreels.com
On Wed, May 29, 2013 at 7:52 PM, Martin Chiteri <martin.chiteri@gmail.com
wrote:
Thank you Christian and Kala,
I had no idea that such resources were in existence. I will take some more time and have a look at them.
Thank you gentlemen.
Martin.
On Wed, May 29, 2013 at 11:59 AM, Emmanuel Kala <emkala@googlemail.com>wrote:
The folks that publish the Swahili-English dictionary (I can't remember their name) ought to have a corpus in digital form that's lying somewhere.
Also, if you're into computational linguistics, Google released their N-gram corpus a couple of years ago and it's available for download: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you....
On May 29, 2013, at 11:27 AM, Christian Ledermann < christian.ledermann@gmail.com> wrote:
and https://wiki.ushahidi.com/display/WIKI/Localization+and+Translation
On Wed, May 29, 2013 at 11:24 AM, Christian Ledermann <christian.ledermann@gmail.com> wrote:
http://blog.ushahidi.com/2013/05/17/in-your-own-words/
On Wed, May 29, 2013 at 10:53 AM, Martin Chiteri <martin.chiteri@gmail.com> wrote: > Hello Skunks, > > I would like to request for a substantial amount of words, phrases or > sentences written in Swahili together with their equivalents in the English > language. I know this is a tall order but I guess that some people on these > lists might have samples, perhaps from projects done for translations (Bible > / Website / stand-alone systems internationalization efforts and so forth). > > The aim is to get enough material that will be used in a machine learning > based project in an upcoming Hackathon next month. We would like to train > machines to recognize phrases in the different languages then attempt to use > similar techniques to make them detect various sub-atomic particle types > from another set of Physics data. > > Any help will be highly appreciated. Thank you. > > Martin. > > -- > You received this message because you are subscribed to the Google Groups > "Nairobi Python Users Group" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to naipug+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > >
-- Best Regards,
Christian Ledermann
Nairobi - Kenya Mobile : +254 702978914
<*)))>{
If you save the living environment, the biodiversity that we have left, you will also automatically save the physical environment, too. But If you only save the physical environment, you will ultimately lose both.
1) Don’t drive species to extinction
2) Don’t destroy a habitat that species rely on.
3) Don’t change the climate in ways that will result in the above.
}<(((*>
-- Best Regards,
Christian Ledermann
Nairobi - Kenya Mobile : +254 702978914
<*)))>{
If you save the living environment, the biodiversity that we have left, you will also automatically save the physical environment, too. But If you only save the physical environment, you will ultimately lose both.
1) Don’t drive species to extinction
2) Don’t destroy a habitat that species rely on.
3) Don’t change the climate in ways that will result in the above.
}<(((*>
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "Nairobi Python Users Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to naipug+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

I don't know how legal/agreeable this may be.... How about pulling the text from a random website/blog that has equivalent transcripts in both languages? In mind I have http://bitcoins.co.ke/index.html *<--- Bitcoins Not BanglaPesa!!* [?][?] Maybe you'll first have to contact the owners of such sites for their consent. Tony.

Hi Tony, As I had mentioned before, we are taking a different approach by using both Swahili and English entries on Wikipedia<http://en.wikipedia.org/wiki/Main_Page>as training sets. Also, we are not using the data for any commercial use. At the same time, content from the wiki is provided for gratis(?) licensed under the Creative Commons Attribution license. Martin. On Thu, May 30, 2013 at 12:52 PM, Tony Likhanga <tlikhanga@gmail.com> wrote:
I don't know how legal/agreeable this may be.... How about pulling the text from a random website/blog that has equivalent transcripts in both languages?
In mind I have http://bitcoins.co.ke/index.html *<--- Bitcoins Not BanglaPesa!!* [?][?] Maybe you'll first have to contact the owners of such sites for their consent.
Tony.
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke

Hi Colleagues, http://www.isoc-ke.org/wiki/Orodha_Ya_Msamiati Best Regards On Thu, May 30, 2013 at 2:38 PM, Martin Chiteri <martin.chiteri@gmail.com>wrote:
Hi Tony,
As I had mentioned before, we are taking a different approach by using both Swahili and English entries on Wikipedia<http://en.wikipedia.org/wiki/Main_Page>as training sets. Also, we are not using the data for any commercial use. At the same time, content from the wiki is provided for gratis(?) licensed under the Creative Commons Attribution license.
Martin.
On Thu, May 30, 2013 at 12:52 PM, Tony Likhanga <tlikhanga@gmail.com>wrote:
I don't know how legal/agreeable this may be.... How about pulling the text from a random website/blog that has equivalent transcripts in both languages?
In mind I have http://bitcoins.co.ke/index.html *<--- Bitcoins Not BanglaPesa!!* [?][?] Maybe you'll first have to contact the owners of such sites for their consent.
Tony.
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
-- Barrack O. Otieno +254721325277 +254-20-2498789 Skype: barrack.otieno http://www.otienobarrack.me.ke/

Hi Barrack, Thanks for sharing the resource. I would like to clarify something. We have changed our course and we are not approaching the problem as a translation issue. Rather, we want computers to actually learn the syntax and grammar rules of Swahili and English. What we will do is pick phrases from different articles in both languages, i.e if we take an entry on JAVA language in Swahili, the other shall be lets say an article on Coffee. After that, if we develop a good enough rule base, we shall try to apply if for data collected from detectors monitoring the decay of sub-atomic particles (quarks) from a Physics experiment. I hope that explains it better, you can see more here: http://shdnairobi.pbworks.com/w/page/66494507/SHD%20Nairobi%202013%2C%20Hack... (look at the second project) Martin. On Thu, May 30, 2013 at 2:44 PM, Barrack Otieno <otieno.barrack@gmail.com>wrote:
Hi Colleagues,
http://www.isoc-ke.org/wiki/Orodha_Ya_Msamiati
Best Regards
On Thu, May 30, 2013 at 2:38 PM, Martin Chiteri <martin.chiteri@gmail.com>wrote:
Hi Tony,
As I had mentioned before, we are taking a different approach by using both Swahili and English entries on Wikipedia<http://en.wikipedia.org/wiki/Main_Page>as training sets. Also, we are not using the data for any commercial use. At the same time, content from the wiki is provided for gratis(?) licensed under the Creative Commons Attribution license.
Martin.
On Thu, May 30, 2013 at 12:52 PM, Tony Likhanga <tlikhanga@gmail.com>wrote:
I don't know how legal/agreeable this may be.... How about pulling the text from a random website/blog that has equivalent transcripts in both languages?
In mind I have http://bitcoins.co.ke/index.html *<--- Bitcoins Not BanglaPesa!!* [?][?] Maybe you'll first have to contact the owners of such sites for their consent.
Tony.
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
-- Barrack O. Otieno +254721325277 +254-20-2498789 Skype: barrack.otieno http://www.otienobarrack.me.ke/
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
participants (3)
-
Barrack Otieno
-
Martin Chiteri
-
Tony Likhanga