Artificial Intelligence Goes Bilingual—Without a Dictionary

Mikel Artetxe, a computer scientist, asked an almost ridiculous question somewhere in San Sebastián on a sunny morning that most likely smelled slightly of the Atlantic. Give someone a stack of non-matching Chinese and Arabic books, then ask them to translate between the two. Impossible, isn’t it? That was the idea. However, the machine he had been training was starting to do just that.

One unsettling reality has been the foundation of the dream of fluid, automatic translation for years. Every contemporary translation tool is powered by neural networks, which are brain-inspired algorithms that require feeding. Indefinitely. Over decades of multilingual paperwork, millions of well-aligned sentence pairs were meticulously created by humans. It performed flawlessly in both French and English. For Basque, Swahili, or any of the thousands of languages that were never included in the UN archives, it performed less well.


Topic	Unsupervised Machine Translation
Lead Researchers	Mikel Artetxe (UPV) & Guillaume Lample (Facebook AI Research)
Institutions Involved	University of the Basque Country, Spain · Facebook AI, Paris
Original Coverage	Reported in Science Magazine
Method Used	Unsupervised neural machine translation
Key Techniques	Back translation and denoising
Benchmark Score	BLEU score of roughly 15 on English-French pairings
Comparable Supervised Score	Google Translate, approximately 40
Human Translator Score	Above 50
Status	Submitted to ICLR 2018, not yet peer reviewed
Significance	First credible attempt at translation without parallel text

Now, two independent research teams have proposed something unusual, uploading their papers to arXiv within a day of one another. It turns out that parallel text may not even be necessary for translation. The way words interact with one another is the trick. In practically every language ever spoken, a table and chair are seated together. Socks and shoes travel together. In the same way that two cities viewed from a satellite start to share the same general shape, the same arteries, and the same logic of human life, if a computer carefully maps these clusters, the maps from two different languages begin to resemble each other.

When you lay one map on top of the other, you get a rough, approximative map that is actually quite helpful. Without a teacher, a bilingual dictionary was created. The idea that languages, despite their apparent differences, share a deep structural rhythm that a machine can detect even when a person is unable to articulate it has a slightly philosophical quality.

The teams then use two approaches with almost poetic names to further develop their systems. Back translation involves translating a sentence from one language to another, comparing the outcome with the original, and making adjustments when necessary. Denoising causes the model to recover the structure instead of memorizing it by shuffling or removing words. It feels more like teaching a child by giving them a torn page and asking them to guess what was missing than it does like programming.

In absolute terms, the results are not yet noteworthy. When translating between English and French, both systems received a score of about 15 on the BLEU scale. Google’s supervised model scores close to 40. A competent person, well over fifty. Therefore, no one will take the place of qualified translators in the future. However, the trajectory is uncommon, and the implications are subtly huge.

Something seems to have changed. It was shocking, according to Di He, a Beijing-based Microsoft researcher whose previous work inspired both papers. It is difficult to ignore the restraint in that word when reading it. For a long time, translation has been viewed as a brute-force data problem. These studies raise the possibility that it could also be an issue with structure, pattern, or the peculiar universal grammar that linguists like Chomsky used to debate in classrooms. You get the impression that the field is getting looser as you watch this happen.

It’s anyone’s guess what comes next. Artetxe’s co-author, Eneko Agirre, took care to refer to the work as an infancy, a gateway rather than a destination. However, that doorway is worth keeping a close eye on for the world’s smaller languages, medical jargon, regional slang, and the half-forgotten dialects that have always existed outside the purview of big tech.

Disclaimer

London Bilingualism's content on health, medicine, and weight loss is solely meant for general educational and informational purposes. This website does not offer any diagnosis, treatment recommendations, or medical advice.

We consistently compile and disseminate the most recent information, findings, and advancements from the medical, health, and weight loss sectors. When content contains opinions, commentary, or viewpoints from professionals, industry leaders, or other people, it is published exactly as it is and reflects those people's opinions rather than London Bilingualism's editorial stance.

We strongly advise all readers to consult a qualified medical professional before acting on any medical, health, dietary, or pharmaceutical information found on this website. Since every person's health situation is different, only a qualified healthcare provider who is familiar with your medical history can offer you advice that is suitable for you.

In a similar vein, any legal, regulatory, or compliance-related information found on this platform is provided solely for informational purposes and should not be used without first obtaining independent legal counsel from a licensed attorney.

You understand and agree that London Bilingualism, its editors, contributors, and affiliated parties are not responsible for any decisions made using the information on this website.

Artificial Intelligence Goes Bilingual—Without a Dictionary

What You Actually Get With Polylang Pro — And What Nobody Tells You About the Cost

Belred Bilingual Academy: The Quiet Bellevue School That’s Raising Tomorrow’s Bilingual Thinkers

From Silicon Valley to Seoul: SK Hynix Adopts AI Bilingual Work to Boost Global Collaboration

What You Actually Get With Polylang Pro — And What Nobody Tells You About the Cost

Kobe Bryant Education: Why Skipping College Was the Smartest Move He Ever Made

Belred Bilingual Academy: The Quiet Bellevue School That’s Raising Tomorrow’s Bilingual Thinkers

NBCC Early Childhood Education: The Program That’s Quietly Changing How New Brunswick Raises Its Kids

Types of Multilingualism: Why Speaking Two Languages Is Never the Same Experience Twice

Donald Trump Education: From Queens to Wharton — The Making of a President’s Mind

Babyland Bilingual Academy Is Quietly Changing How Florida Kids Learn Two Languages Before Age Five

Your Child’s Brain Is Being Rewired Every Time They Switch Languages — Here’s Why That’s a Good Thing

What Does It Actually Mean to Be Multilingual? The Answer Is More Complicated Than You Think

ClassLink SAISD: How San Antonio Schools Are Finally Getting Digital Access Right

Must Read

America’s Quietest Language War: The Battle Over Bilingual Ballots

America’s Antimicrobial Resistance Crisis Is About to Get a $1.8 Billion Diagnostic Innovation Market – It May Not Be Enough.

The Data Dilemma: Building Datasets to Help AI Interpret Complex Medical Terminology

The Last Generation: Inside the Race to Save America’s Vanishing Heritage Languages

Artificial Intelligence Goes Bilingual—Without a Dictionary

Related Posts