The Data Dilemma: Building Datasets to Help AI Interpret Complex Medical Terminology

A doctor I once spoke with at a small clinic in Karachi has two stacks of paper on her desk. Patient histories are one. The other is a stack of lab reports that she has been meaning to digitize for several months. She says that perhaps 60% of what she writes by hand is understood by the scanner in her office and laughs about it in that weary way that doctors do.

She fixes the rest by hand. It’s a minor detail, but it lies at the core of one of the most peculiar issues facing healthcare technology at the moment: teaching machines to read medication.

Field	Detail
Topic	Building datasets for AI interpretation of medical terminology
Core Framework Cited	RF-AI framework (Recognition, Formatting, AI-Processing)
Primary Technologies	OCR, multimodal LLMs, NLP, deep learning
Notable Medical Model	Med-PaLM (Google Research)
Common Data Sources	Electronic Health Records, free-text clinical notes, lab reports
Largest Ethical Concerns	Bias, transparency, patient consent and confidentiality, accountability
Institutions Active in Research	Dartmouth, Google Health, Stanford HAI
Adoption Stage	Early clinical pilots, limited deployment
Key Challenge	Non-representative, unstructured medical data
Regulation Status	Fragmented, lagging behind innovation
Year of Recent Acceleration	2023–2025
Common Failure Mode	Hallucinated diagnoses, missed negatives

Despite all the hype surrounding big language models in healthcare, most people are unaware of how difficult it is to actually create useful datasets. English is not a medical language. It’s not even a single language. The same condition may be described in three different shorthand styles by a radiologist in Boston, a cardiologist in Lahore, and a general practitioner in rural Spain, using acronyms that are unique to their respective hospitals. The cracks appear quickly when you feed that into an AI model that was primarily trained on neat American textbook data.

Researchers feel that the field has been advancing at two different rates simultaneously. In demos, the models continue to get faster, smarter, and more impressive. However, the information beneath them is still inconsistent. This was intended to be resolved by electronic health records.

In actuality, a large portion of the helpful information is still found in free-text notes that are hurriedly written between patients and are riddled with negations, hedges, and terms like “rule out” that essentially mean the opposite of what they seem to say. A model that interprets “no evidence of malignancy” incorrectly as “evidence of malignancy” is a serious flaw. A lawsuit is inevitable.

This is something that the Dartmouth team and others have worked on for years. Some of their early work involved laborious step-by-step instructions for how a computer should view a slide or a scan, as well as manually engineered features. That was altered, primarily for the better, by deep learning. However, it also produced a new reliance. The model is only as inclusive, accurate, and fair as the data used to train it. Additionally, medical data almost always reflects whoever was in the hospital during that decade.

The frequency with which the same datasets recur in paper after paper is difficult to ignore. Western, urban, and frequently biased in favor of patients who already had access to quality care. Communities that are underrepresented hardly show up. When they do, the model performs worse on them—exactly the opposite of what equitable medicine is meant to look like. Scholars are aware of this. Many of them express it honestly. However, the solution is costly, time-consuming, and politically complex.

Consent is another issue. There is a clause in the forms that patients sign that allows their data to be used to train an algorithm that they will never see. This is acceptable to some ethicists. Some people don’t. The lines have not yet been drawn.

However, something is changing. The messy middle ground between an image, a lab value, and a casual sentence in a discharge note is beginning to be handled by multimodal models such as Med-PaLM. It’s another matter entirely whether they are reliable enough for a weary doctor at two in the morning. The technology is amazing. It is still catching up to the data feeding it. A handwritten prescription on paper will continue to humble the smartest AI in the room until that gap closes.

Disclaimer

London Bilingualism's content on health, medicine, and weight loss is solely meant for general educational and informational purposes. This website does not offer any diagnosis, treatment recommendations, or medical advice.
We consistently compile and disseminate the most recent information, findings, and advancements from the medical, health, and weight loss sectors. When content contains opinions, commentary, or viewpoints from professionals, industry leaders, or other people, it is published exactly as it is and reflects those people's opinions rather than London Bilingualism's editorial stance.
We strongly advise all readers to consult a qualified medical professional before acting on any medical, health, dietary, or pharmaceutical information found on this website. Since every person's health situation is different, only a qualified healthcare provider who is familiar with your medical history can offer you advice that is suitable for you.
In a similar vein, any legal, regulatory, or compliance-related information found on this platform is provided solely for informational purposes and should not be used without first obtaining independent legal counsel from a licensed attorney.
You understand and agree that London Bilingualism, its editors, contributors, and affiliated parties are not responsible for any decisions made using the information on this website.

The Data Dilemma: Building Datasets to Help AI Interpret Complex Medical Terminology

The Welsh Language in London: Why It’s Quietly Thriving 200 Miles From Home

The French Connection: Why Maine’s Forgotten French-Speaking Communities Are Suddenly Cool Again

Inside the AI That Speaks Tsotsil — A Mayan Language Once Considered Untranslatable

Translators and Mediators: The Heavy Burden on Bilingual Youth in U.S. Hospitals

The Data Dilemma: Building Datasets to Help AI Interpret Complex Medical Terminology

The Cultural Empathy Gap in Machine Learning: Can AI Ever Truly Be Bilingual?

The Battle for Bilingual Britain: West London Parents Fight to Save Their Children’s French Education

The Welsh Language in London: Why It’s Quietly Thriving 200 Miles From Home

The American Suburbs Are Becoming Bilingual—And It’s Transforming Local Politics

What New York City Can Learn From London’s Booming Trilingual Public Schools

The French Connection: Why Maine’s Forgotten French-Speaking Communities Are Suddenly Cool Again

Inside the AI That Speaks Tsotsil — A Mayan Language Once Considered Untranslatable

Why Bilingual AI Is the Skill Big Tech Is Quietly Paying $400,000 a Year For

The Data Dilemma: Building Datasets to Help AI Interpret Complex Medical Terminology

Related Posts