A group of engineers recently witnessed a medical language model perform a seemingly straightforward task in a research lab at MBZUAI, the first university in the world devoted solely to artificial intelligence. A specialist’s report written in complex clinical English was fed into the model, which translated it into conversational Arabic that a patient might actually understand rather than formal Arabic. The result wasn’t flawless. However, it was surprisingly close, and the researchers understood what that meant: even though it was still frustratingly wide, the gap between English-dominant AI and everything else was closing.
One of the most costly and obscure competitions in technology today is the race to create a large language model that is truly bilingual. According to Anthropic CEO Dario Amodei, training expenses for frontier LLMs have surpassed $100 million and are expected to reach $5 to $10 billion in the next year or two. For any single-language system, those numbers alone would be astounding. The difficulty increases in ways that aren’t always apparent from the outside when trying to get a model to reason fluently in two or more languages—not just translate between them, but actually think in them.
Anyone developing educational AI tools should be concerned about what Stanford researchers found while examining bilingual student writing. Millions of students in the American Southwest and beyond use Spanglish, a fluid blend of Spanish and English, and pre-trained models demonstrated quantifiable grading bias against bilingual text. Fine-tuning was beneficial. In all three languages—English, Spanish, and Spanglish—models trained on artificial bilingual data performed noticeably better. However, the discovery exposed a more serious issue: most large language models have been built around English from the ground up, and adding a second language after the fact usually results in something more akin to a tourist phrasebook than true fluency.

The geographic pattern that is developing in this area is difficult to ignore. BiMediX, the first bilingual medical LLM that outperforms GPT-4 on medical benchmarks in both Arabic and English, was released by Abu Dhabi’s MBZUAI. The largest open-science multilingual model with 46 languages was produced by a European initiative. The French startup Mistral has been testing mixture-of-experts architectures that have the potential to significantly reduce the cost of multilingual inference. The largest American labs, such as OpenAI, Anthropic, and Google, continue to prioritize English-first performance, viewing other languages as secondary benchmarks rather than fundamental design objectives. There is a feeling that organizations outside of Silicon Valley, whose users have never had the luxury of assuming that everyone speaks English, may be the source of the next significant advancement in bilingual AI.
This situation’s economics are similar to what the semiconductor industry experienced decades ago. The number of businesses capable of producing cutting-edge chips fell to just three: TSMC, Intel, and Samsung, as fabrication costs surged above $20 billion per plant. Everyone else turned into a client. For foundation models, a similar consolidation seems likely, which raises the awkward question of who gets to choose which languages are important enough to train on. There may be a form of linguistic imperialism ingrained in the infrastructure of thought itself if only four or five organizations—the majority of which are Anglophone—can afford to construct frontier LLMs.
A partial escape route is provided by smaller, specialized models. Inference efficiency was increased fourfold by Mistral’s mixture-of-experts approach, indicating that brute-force scale is not always necessary for bilingual capability. The semi-automated translation pipeline used to create PALO, the multilingual multimodal model with ten languages and five billion speakers, kept costs under control. These aren’t ideal solutions—a pickup truck isn’t a sports car, according to one Forrester analyst—but they offer a more democratic way forward, where a startup in Paris or a university in Melbourne might contribute just as much as a hyperscaler in Mountain View.
It’s still unclear if the industry truly takes that route. The investors continue to write checks denominated in dollars, the benchmarks continue to reward English-centric performance, and the money continues to flow toward larger English-centric models. However, in research labs dispersed from Helsinki to Canberra, clinics throughout the Gulf states, and classrooms throughout Texas, people are constructing something different—quieter, less well-known, and perhaps more significant. Perhaps the most important race isn’t the billion-dollar one that everyone is watching.
London Bilingualism's content on health, medicine, and weight loss is solely meant for general educational and informational purposes. This website does not offer any diagnosis, treatment recommendations, or medical advice.
We consistently compile and disseminate the most recent information, findings, and advancements from the medical, health, and weight loss sectors. When content contains opinions, commentary, or viewpoints from professionals, industry leaders, or other people, it is published exactly as it is and reflects those people's opinions rather than London Bilingualism's editorial stance.
We strongly advise all readers to consult a qualified medical professional before acting on any medical, health, dietary, or pharmaceutical information found on this website. Since every person's health situation is different, only a qualified healthcare provider who is familiar with your medical history can offer you advice that is suitable for you.
In a similar vein, any legal, regulatory, or compliance-related information found on this platform is provided solely for informational purposes and should not be used without first obtaining independent legal counsel from a licensed attorney.
You understand and agree that London Bilingualism, its editors, contributors, and affiliated parties are not responsible for any decisions made using the information on this website.
