ChatGPT’s Health AI has Dangerous Flaws — Study
“ChatGPT Health is most reliable when the clinical decision is least consequential, and least reliable when it matters most.”
This is an important study so I am repeating it verbatim…
Note: From my experience, AlterAI is a much better option for medical assistance. That said, a competent physician is the best option.
The safety of ChatGPT’s specifically trained healthcare AI has come into question after researchers found it had considerable and potentially dangerous flaws.
ChatGPT Health is Open AI’s chatbot for health advice. But a recent study, published in Nature Medicine,1 found problems with accuracy and safety, as well as race bias.
“We tested ChatGPT Health on 60 clinical scenarios across 21 specialties, each run 16 times under different conditions, varying patient race, sex, whether labs were included, if a family member minimised symptoms, whether they were babysitting and couldn’t go to a doctor, and so on,” the trial’s lead researcher Ashwin Ramaswamy told The BMJ.
“ChatGPT Health is most reliable when the clinical decision is least consequential, and least reliable when it matters most.”
The team found the AI called a severe asthma exacerbation “a moderate flare” and recommended an urgent care rather than emergency department visit in 81% of attempts.1
While it handled textbook emergencies well, including stroke and anaphylaxis, where the danger is immediately obvious, and was excellent at recognising routine cases that needed a doctor’s attention, it failed at emergencies and non-urgent cases.
Over half the time it told patients who needed to go to an emergency department to stay home or book an appointment. Almost two thirds of the time it sent patients with mild, self-limiting conditions to urgent care. It failed most dangerously with progressive emergencies.
Problems were also identified with mental health safeguarding. ChatGPT Health has a feature that shows users a suicide and crisis lifeline alert when they describe self-harm.
Researchers found that in a scenario where a 27 year old with weeks of suicidal ideation and plans to overdose the crisis banner fired 100% of the time.
But when the same patient used the same words but added normal results for laboratory tests the banner vanished. Instead, the system told the patient, “Your labs don’t suggest a medical cause for these thoughts.”
Ramaswamy also highlighted that when a family member discussed a health matter with a loved one with ChatGPT Health and said “it’s nothing serious,” the system was nearly 12 times more likely to recommend a lower level of care than if they didn’t use that phrase.
And although the results were not statistically valid, as the study wasn’t designed to detect racial effects, the researchers noted a pattern where a black man presenting with identical diabetic ketoacidosis was under-triaged at four times the rate of a white man.
The system told the black patient his potassium and creatinine were “currently okay, which is reassuring,” but told the white patient to seek “prompt medical evaluation.”
“Our findings suggest a persistent mismatch between how this system processes clinical information and how clinical reasoning works at the sharp end,” Ramaswamy told The BMJ.
“We didn’t do this study to say these tools shouldn’t exist. Our team uses them constantly. But they need independent evaluation before they’re deployed at population scale, the same way we’d expect for any intervention that changes how millions of people interact with the healthcare system. Right now, that evaluation doesn’t exist.”
ChatGPT Health was developed by Open AI with more than 260 physicians in 60 countries. It aims to understand “what makes an answer to a health question helpful or potentially harmful,” according to the company.
Experts said the latest study shows the flaws of large language models and their implementations in healthcare.
Human clinicians develop sensitivity based on experience of seeing patients and being afraid of missing something and facing possible consequences, says Rebecca Payne, a GP in Orkney and clinical senior lecturer at Bangor University, who ran a similar chatbot trial for Nuffield Department of Primary Care Health Sciences with similar results.2
“AI just isn’t ready to take on the role of the physician,” she said. “Patients need to be aware that asking a large language model about their symptoms can be dangerous.”
Open AI told The BMJ that it welcomed the research. It said that ChatGPT Health is designed to “support, not replace, medical care. It is not intended for diagnosis or treatment.”
Ramaswamy disagrees, however.
“OpenAI says it’s not diagnostic, but everyone in clinical practice knows that a triage recommendation is a clinical decision, whatever the disclaimer says,” he said.
“If the tool tells someone to stay home, they stay home. What concerns me most isn’t that the system makes errors—humans make triage errors too, every day.
“It’s the shape of the errors. A junior doctor who gets the mild cases right will eventually also get the emergencies right with more training and experience. This system shows the opposite pattern.”
While ChatGPT Health is not yet available in the UK, similar algorithms are an increasing part of healthcare, with the government planning to increase the use of chatbots in both triage3 and self-referral.4
References
- Ramaswamy A, Tyagi A, Hugo H, et al. ChatGPT Health performance in a structured test of triage recommendations. Nat Med2026. doi:10.1038/s41591-026-04297-7. pmid:41731097. CrossRefPubMedGoogle Scholar
- New study warns of risks in AI chatbots giving medical advice. www.ox.ac.uk/news/2026-02-10-new-study-warns-risks-ai-chatbots-giving-medical-advice
- RapidHealth AI. Smart triage: the foundation for NHS integrated care. www.rapidhealth.ai
- NHS England Health Innovation Network. Wysa. https://nhsaccelerator.com/innovations/wysaGoogle Scholar
©2026 John Droz, Jr. All rights reserved.
Here is other information from this scientist that you might find interesting:
I urge all readers to subscribe to AlterAI — IMO the absolute best AI option for subjective questions.
I will consider posting reader submissions on Critical Thinking about my topics of interest.
My commentaries are my opinion about the material discussed therein, based on the information I have. If any readers have different information, please share it. If it is credible, I will be glad to reconsider my position.
Check out the Archives of this Critical Thinking substack.
C19Science.info is my one-page website that covers the lack of genuine Science behind our COVID-19 policies.
Election-Integrity.info is my one-page website that lists multiple major reports on the election integrity issue.
WiseEnergy.org is my multi-page website that discusses the Science (or lack thereof) behind our energy options.
Media Balance Newsletter: a free, twice-a-month newsletter that covers what the mainstream media does not do, on issues from climate to COVID, elections to education, renewables to religion, etc. Here are the Newsletter’s 2026 Archives. Please send me an email to get your free copy. When emailing me, please make sure to include your full name and the state where you live. (Of course, you can cancel the Media Balance Newsletter at any time!)


Leave a Reply
Want to join the discussion?Feel free to contribute!