Imagine a hospital hallway in a large American city, complete with fluorescent lights that hum a little too loudly, residents carrying tablets loaded with decision-support software, and nurses moving quickly. A Black patient who is having trouble breathing is brought in. Their blood oxygen levels are measured by a sensor that is attached to their finger. The figure seems reasonable. No emergency is flagged by the algorithm. The patient is not escalated, but they are observed. The device’s infrared light reads differently through darker skin, which is something it was never intended to account for. The number was incorrect. Additionally, the system based on that figure was flawed.
This isn’t speculative. One of the most widely used clinical instruments, pulse oximeters, has been shown to consistently overestimate oxygen saturation in non-white patients. Black patients received less supplemental oxygen than white patients, according to a 2022 retrospective study. This is probably because the device was deceiving the medical professionals who were treating them. According to research, occult hypoxemia that is missed by conventional pulse oximetry is three times more common in black patients. For years, there has been evidence of this discrepancy. Nevertheless, the devices are still in use, essentially unaltered, in hospitals all over the nation, and the AI systems being developed to automate and speed up clinical decision-making now use the measurements from the devices.
| Core Issue | Algorithmic bias in healthcare AI — AI systems trained on historically skewed datasets systematically disadvantage Black, Latinx, and other minority patients, producing misdiagnoses, inequitable care recommendations, and compounding existing health disparities |
|---|---|
| Pulse Oximeter Bias | Pulse oximeters measure blood oxygen by sending infrared light through skin — a method known to overestimate oxygen saturation in non-white patients; Black patients are three times more likely to suffer undetected occult hypoxemia as a result, a disparity confirmed in a 2022 retrospective study |
| Landmark Algorithm Study (Science, 2019) | Researchers at UC Berkeley analyzed a widely used commercial health risk algorithm affecting millions of patients; at identical risk scores, Black patients were demonstrably sicker than White patients — the algorithm systematically underestimated their illness severity |
| Root Cause of Algorithmic Racial Bias | The algorithm used healthcare costs as a proxy for health needs — but because less money is historically spent on Black patients with equivalent conditions, the system falsely concluded Black patients were healthier than equally sick White patients |
| Scale of Impact | Correcting the cost-proxy flaw in one algorithm alone would have increased the share of Black patients identified for additional care from 17.7% to 46.5% — a gap representing hundreds of thousands of undertreated patients across the US health system |
| What “Big Data” Misses | Medical AI draws on records, imaging, and biomarkers — but omits “small data”: social determinants including transportation access, food security, work schedules, and community context; these omissions generate treatment plans that patients of color are structurally unable to follow |
| Researcher Perspective | Fay Cobb Payton, Mathematics and Computer Science professor at Rutgers-Newark, found that underrepresentation of Black and brown patients in medical research — combined with a lack of diversity among AI developers — produces algorithms that perpetuate false assumptions about minority patients |
| Dermatology AI Example | Convolutional neural networks trained to classify skin lesions — performing at or above dermatologist level — were predominantly trained on images of lighter skin; their accuracy on darker skin tones drops significantly, raising diagnostic concerns for melanoma detection in patients of color |
| Structural Feedback Loop | Biased training data produces biased outputs; biased outputs inform clinical decisions; those decisions generate new records that re-enter AI training pipelines — creating a self-reinforcing cycle that amplifies rather than corrects racial health disparities over time |
| Proposed Solutions | Researchers advocate for open science practices: participant-centered algorithm development, inclusive data standards, mandatory code sharing, diverse developer pipelines, and guaranteed human clinical oversight at every stage of AI-assisted diagnosis and treatment |
The fact that this isn’t a single hardware defect is the deeper issue. It’s a pattern that permeates the data, the algorithms that were trained on it, and the clinical judgments that those algorithms guide. A group of researchers examined a commercial health risk algorithm that is utilized by health systems nationwide, impacting millions of patients, and published their findings in Science in 2019. They discovered that the algorithm used healthcare costs as a stand-in for health needs, which is startling and, once you understand the mechanism, almost predictable. That seems reasonable at first glance. Treatment for sick patients is more expensive. But when structural inequality is taken into account, it falls apart.
Due to a combination of historical undertreatment, systemic underfunding of minority-serving health systems, and access barriers, less money is spent treating Black patients with the same conditions as White patients. As a result, the algorithm interpreted lower costs as lower need. Black patients identified by the algorithm were significantly sicker than their white counterparts at the same risk score. Decades of disparate spending data had taught the system to underestimate Black illness. The percentage of Black patients receiving extra care would have more than doubled from 17.7% to 46.5% if that one proxy option had been fixed.

The issue is particularly well-framed by Fay Cobb Payton, a computer science professor at Rutgers-Newark who has spent years researching AI and healthcare disparities. She contends that the algorithms are based on “big data”—medical records, imaging, biomarkers—while methodically ignoring what she refers to as “small data”: a patient’s residence, whether they have access to transportation, whether they work two jobs, and whether there is a fresh produce grocery store in their neighborhood. An algorithm does not take into consideration the patient’s ability to adhere to a treatment plan that calls for regular clinic visits and daily exercise. Additionally, the system records non-compliance when a patient doesn’t follow instructions, creating new data points that reinforce the same faulty assumptions in subsequent models. It is a loop of feedback. The prejudice teaches itself.
Another unsettling example comes from the field of dermatology. In some studies, convolutional neural networks trained to identify skin lesions have performed on par with or better than skilled dermatologists; this accomplishment has caused real excitement in the medical AI community. The fact that the training datasets were primarily made up of photos of patients with lighter skin tones was something that those headlines frequently failed to mention. On darker skin tones, the accuracy decreases. That gap is not a small technical detail in a field where missing a melanoma can be lethal.
Seeing the healthcare system rely more and more on instruments that were inadvertently designed to benefit some patients more than others is extremely unsettling. The unease increases when you take into account that the developers of these systems are largely homogeneous, which influences which populations are used as the default human body for which the system is optimized, which edge cases are tested, and which failure modes are detected prior to deployment.
How soon the industry will take action to address this is still unknown. Open science methods, more inclusive data standards, human oversight at every stage of decision-making, and demands that algorithms be tested across demographic groups prior to clinical deployment are some of the recommendations. This is the direction that some institutions are taking. Some seem to be waiting for regulations that have not yet been implemented. Meanwhile, the machines continue to operate, learning from data that was never fair in the first place and producing results that perpetuate those long-standing injustices under the guise of objectivity.
London Bilingualism's content on health, medicine, and weight loss is solely meant for general educational and informational purposes. This website does not offer any diagnosis, treatment recommendations, or medical advice.
We consistently compile and disseminate the most recent information, findings, and advancements from the medical, health, and weight loss sectors. When content contains opinions, commentary, or viewpoints from professionals, industry leaders, or other people, it is published exactly as it is and reflects those people's opinions rather than London Bilingualism's editorial stance.
We strongly advise all readers to consult a qualified medical professional before acting on any medical, health, dietary, or pharmaceutical information found on this website. Since every person's health situation is different, only a qualified healthcare provider who is familiar with your medical history can offer you advice that is suitable for you.
In a similar vein, any legal, regulatory, or compliance-related information found on this platform is provided solely for informational purposes and should not be used without first obtaining independent legal counsel from a licensed attorney.
You understand and agree that London Bilingualism, its editors, contributors, and affiliated parties are not responsible for any decisions made using the information on this website.
