At the core of Meta’s most ambitious AI project is a tiny irony. The messy, unplanned expanse of YouTube served as part of the training for the system that can now translate between 200 languages, including ones that professional linguists have spent decades attempting to digitize. not scholarly databases. not official documents. Just people conversing in languages that the internet has largely forgotten.
No Language Left Behind, or NLLB-200 as Meta refers to it, seems more alien the more you read about it. The model works with languages like Kamba, Lao, Lingala, and Fula, which even the best commercial translation tools either completely ignore or perform poorly. Currently, mainstream translation services adequately support fewer than 25 African languages. 55 of them are covered by NLLB-200. Just that disparity reveals something about the people for whom the internet was intended and those for whom it has not.
| Information | Details |
|---|---|
| Project Name | No Language Left Behind (NLLB-200) |
| Developed By | Meta AI Research |
| Languages Supported | 200, including 55 African languages |
| Evaluation Dataset | FLORES-200 |
| Performance Gain | 44% average improvement over previous state-of-the-art benchmarks |
| Open-Sourced Components | Model, training code, dataset recreation tools |
| Grant Funding | Up to $200,000 for nonprofits |
| Daily Translations Powered | 25 billion across Facebook, Instagram |
| Partnered Organization | Wikimedia Foundation |
| Notable Side Project | Hokkien-to-English speech translator |
| Lead Researchers | Peng-Jen Chen, Juan Pino, and team |
The FLORES-200 benchmark was used by engineers to test the system, and the results were startling. an average improvement of 44% over earlier cutting-edge systems. The increase was more than 70% for some Indian and African languages. Translation research typically measures progress in slow, nearly geological increments, so numbers like these are uncommon. It seems that Meta’s team was aware of this; they knew the leap needed to be loud enough to be significant.
The YouTube angle is intriguing because it highlights the limitations of conventional training data. Parallel text, or the same sentence written in two languages neatly aligned, has always been the foundation of translation models. However, those tidy parallel sentences are just not present in hundreds of languages. In order to find spoken language, Meta’s researchers mined the enormous, unaltered video archives of the contemporary internet. Ten years ago, this type of training source would have been unimaginable. It may be the only one that scales now.

It’s difficult not to consider what’s being lost in the celebration as you watch this play out. A model that has, in a sense, only listened is being compared to bilingual translators, who are real people who have lived inside a language for years. The model continues to have hallucinations. It continues to generate self-assured nonsense. It still makes mistakes in cultural nuance that linguists can see right away. Meta has been open about this, releasing language identification tools and toxicity filters in addition to the model. Even though the BLEU scores indicate otherwise, there is still a significant difference between technically accurate translation and human translation.
The Hokkien project, on the other hand, is practically a different narrative. There is no widely used written form of Hokkien, which is spoken by 45 million people in Taiwan, mainland China, Malaysia, Singapore, and the Philippines. Meta used Mandarin as a middleman to train a translator for it, translating Hokkien speech into Mandarin text, then back to English. Engineer Peng-Jen Chen and Mark Zuckerberg shared a demonstration on Facebook in which the AI translated between them as they spoke in different languages. It’s highly likely that the video was polished. However, the fundamental accomplishment was not dramatic.
Bilingual people are not yet replaced by what Meta is developing. It has a more subdued effect. It somewhat lessens the devastation caused by the absence of those humans in the languages where they have never existed on a large scale. To be honest, it’s still unclear if that’s a victory or a kind of silent defeat.
London Bilingualism's content on health, medicine, and weight loss is solely meant for general educational and informational purposes. This website does not offer any diagnosis, treatment recommendations, or medical advice.
We consistently compile and disseminate the most recent information, findings, and advancements from the medical, health, and weight loss sectors. When content contains opinions, commentary, or viewpoints from professionals, industry leaders, or other people, it is published exactly as it is and reflects those people's opinions rather than London Bilingualism's editorial stance.
We strongly advise all readers to consult a qualified medical professional before acting on any medical, health, dietary, or pharmaceutical information found on this website. Since every person's health situation is different, only a qualified healthcare provider who is familiar with your medical history can offer you advice that is suitable for you.
In a similar vein, any legal, regulatory, or compliance-related information found on this platform is provided solely for informational purposes and should not be used without first obtaining independent legal counsel from a licensed attorney.
You understand and agree that London Bilingualism, its editors, contributors, and affiliated parties are not responsible for any decisions made using the information on this website.
