16 Comments
User's avatar
Andrew Smith's avatar

Let me ask you this, Edem: do you think an independently trained LLM in Africa (or other places) could then be used to improve existing LLMs like ChatGPT, Bard, etc where "beautiful" is too often a synonym for "white"? In other words, could a much better, more comprehensive system result from these efforts?

Also, LOL @ "Goatesque"

Expand full comment
Gold Bassey Edem's avatar

Haha, I'm glad you like Goatesque man.

Regards your LLM question, I think it could. Technically speaking, an LLM trained using data centric to different regions could be brought together using Transfer learning and form an ensemble model.

But resource wise, I think a smarter plan would involve getting the model trained on a well distributed dataset descriptive of the population distribution.

Expand full comment
Andrew Smith's avatar

Nice. I am 100% rooting for you to be successful in getting ideas like this out into the world! I see that as a huge necessary component of success in any venture like this - lots of people have to understand the idea and get excited by it.

Expand full comment
Gold Bassey Edem's avatar

Haha, you're an incredible human being man.

Expand full comment
Andrew Smith's avatar

Thanks, man! I feel the same about you, and frankly about the little community I've met here. These are some remarkable folks we're surrounded by.

Expand full comment
Gold Bassey Edem's avatar

True words man.

Expand full comment
Nat's avatar

Very kind of you, Edem! Thanks for the mention! I hope this dramatic situation will be solved soon. Back in 2016 I was denied a grant because I was from Georgia. I know what you're talking about and I understand your POV.

Expand full comment
Gold Bassey Edem's avatar

You're welcome Nat, it was an incredible piece. I'm really shocked about your Georgian exclusion, I was of the opinion that the US was a really inclusive place.

Expand full comment
Matt Buckley's avatar

Definitely agree with the premise, and have been thinking about what it means when it comes to languages. What happens to Africa's incredible linguistic diversity if AI tools can only be used in English, or French, or a handful of national languages which have sufficient written usage to train or fine-tune a model? Developing ways for systems like those built around LLMs to learn minority languages should be a major focus of research imo

Expand full comment
Gold Bassey Edem's avatar

Yeah you're right Matt. Most language translation corpus's contain data on popular languages and leave a large bunch of these languages under-represented and ironically the regions with under represented languages require the most help and are most susceptible to the effects of language barriers like (inefficient knowledge distillation).

I was thinking of how this could be solved perhaps by creating an open source platform where under-represented languages could be modelled and trained on less data and available via API?

Perhaps we could talk about this?

Expand full comment
Matt Buckley's avatar

Interesting idea! I've also seen some studies on fine-tuning models in order to teach them new languages, with the basic linguistic structures learnt during initial training remaining the same. Would be happy to discuss!

Expand full comment
Gold Bassey Edem's avatar

Awesome! I'll send an email.

Expand full comment
Gold Bassey Edem's avatar

Or rather, please send me an email at; ekmedm@gmail.com

Expand full comment
Zan Tafakari's avatar

Thanks for the mention Edem! Definitely agree Africa needs its own data - and in fact I think if you solve the data problem, Africa will catch up in no time. Data diversity is still an issue in some use cases in US/UK too - so sorting this out early would let Africa leap frog in certain aspects. A bit like how Africa leapfrogged into mobile and mobile payments.

Expand full comment
Gold Bassey Edem's avatar

Exactly man! I think if we can provide an African solution it could potentially allow Africa to lead the new AI renaissance.

Expand full comment
Majid's avatar

I wholeheartedly agree with your analysis, Edem. The dearth of data can be primarily attributed to the absence of local ownership of online content. The majority of discussions among Africans on various platforms, be it in comments, posts, or threads, are stored by foreign entities. I firmly believe that this limitation in training data originates from this external storage. This realization motivated me to establish Vouchaah, a forum dedicated to Nigeria, aiming to address and rectify this gap.

Expand full comment