Aurum Bits by Gold Edem

Haha, I'm glad you like Goatesque man.

Regards your LLM question, I think it could. Technically speaking, an LLM trained using data centric to different regions could be brought together using Transfer learning and form an ensemble model.

But resource wise, I think a smarter plan would involve getting the model trained on a well distributed dataset descriptive of the population distribution.

Expand full comment

Andrew Smith

Nice. I am 100% rooting for you to be successful in getting ideas like this out into the world! I see that as a huge necessary component of success in any venture like this - lots of people have to understand the idea and get excited by it.

Expand full comment

Nov 11, 2023

Haha, you're an incredible human being man.

Expand full comment

Andrew Smith

Nov 11, 2023

Thanks, man! I feel the same about you, and frankly about the little community I've met here. These are some remarkable folks we're surrounded by.

Expand full comment

Nov 12, 2023

True words man.

Expand full comment

Nat

Nov 9, 2023

Very kind of you, Edem! Thanks for the mention! I hope this dramatic situation will be solved soon. Back in 2016 I was denied a grant because I was from Georgia. I know what you're talking about and I understand your POV.

Expand full comment

You're welcome Nat, it was an incredible piece. I'm really shocked about your Georgian exclusion, I was of the opinion that the US was a really inclusive place.

Expand full comment

Matt Buckley

Nov 14, 2023

Definitely agree with the premise, and have been thinking about what it means when it comes to languages. What happens to Africa's incredible linguistic diversity if AI tools can only be used in English, or French, or a handful of national languages which have sufficient written usage to train or fine-tune a model? Developing ways for systems like those built around LLMs to learn minority languages should be a major focus of research imo

Expand full comment

Nov 14, 2023

Yeah you're right Matt. Most language translation corpus's contain data on popular languages and leave a large bunch of these languages under-represented and ironically the regions with under represented languages require the most help and are most susceptible to the effects of language barriers like (inefficient knowledge distillation).

I was thinking of how this could be solved perhaps by creating an open source platform where under-represented languages could be modelled and trained on less data and available via API?

Perhaps we could talk about this?

Expand full comment

Matt Buckley

Nov 14, 2023

Interesting idea! I've also seen some studies on fine-tuning models in order to teach them new languages, with the basic linguistic structures learnt during initial training remaining the same. Would be happy to discuss!

Expand full comment

Nov 15, 2023

Awesome! I'll send an email.

Expand full comment

Nov 15, 2023

Or rather, please send me an email at; ekmedm@gmail.com

Expand full comment

Zan Tafakari

Thanks for the mention Edem! Definitely agree Africa needs its own data - and in fact I think if you solve the data problem, Africa will catch up in no time. Data diversity is still an issue in some use cases in US/UK too - so sorting this out early would let Africa leap frog in certain aspects. A bit like how Africa leapfrogged into mobile and mobile payments.

Expand full comment