AI Chatbots Give Misleading Health Advice 50% of Time

A major new study has found that popular AI chatbots deliver problematic medical advice roughly half the time. The research, published in the peer-reviewed medical journal BMJ Open, evaluated five widely used platforms — ChatGPT, Gemini, Meta AI, Grok, and DeepSeek — and found that about 50 percent of all health-related responses were inaccurate, incomplete, or misleading. Nearly 20 percent were classified as highly problematic.

The findings arrive at a critical moment, as hundreds of millions of people now turn to AI chatbots for health guidance every week — often without realizing how unreliable the answers can be.

What the Study Found

Researchers from the United States, Canada, and the United Kingdom asked each of the five chatbots ten questions across five health categories: cancer, vaccines, nutrition, stem cells, and athletic performance. The chatbots performed relatively better on closed-ended questions and in areas like vaccines and cancer, but struggled significantly with open-ended prompts and topics like stem cells and nutrition.

One of the most troubling findings was the confidence with which the chatbots delivered their answers. Responses were presented with authority and certainty, regardless of whether the information was accurate. No chatbot produced a fully complete and accurate reference list in response to any prompt, and citations were frequently incomplete or entirely fabricated.

Only two out of all the prompts resulted in a refusal to answer — both from Meta AI. Every other chatbot responded to every question, including adversarial ones, without adequate caution.

The Citation Problem

The study also examined how well chatbots cited their sources. Across ChatGPT, ScholarGPT, and DeepSeek, only about 32 percent of more than 500 citations were accurate. Nearly half were at least partially fabricated. This means users who try to verify the information by checking the references may find themselves chasing sources that do not exist — a problem widely known as hallucination in the AI industry.

The researchers warned that deploying these chatbots without public education and oversight risks amplifying health misinformation on a massive scale.

200 Million People Ask ChatGPT Health Questions Weekly

The scale of the problem is enormous. OpenAI has said that more than 200 million people ask ChatGPT health and wellness questions every single week. The platform launched dedicated health tools for both everyday users and clinicians earlier this year, and Anthropic announced a healthcare offering for its Claude platform around the same time.

The rush by AI companies to expand into healthcare makes these findings especially urgent. As these platforms actively market themselves as useful for health decisions, the gap between what users expect and what the technology can reliably deliver continues to widen.

Oxford Study Confirms the Risk

The BMJ Open findings echo an earlier study from the University of Oxford, published in Nature Medicine in February 2026. That research — the largest user study of AI models for medical decision-making — found that chatbots that performed well on standardized medical tests faltered badly when interacting with real users.

In the Oxford study, nearly 1,300 participants were asked to identify potential health conditions based on detailed medical scenarios developed by doctors. Participants using AI assistance did not perform significantly better than those using traditional sources, and the chatbots frequently gave wrong diagnoses and failed to recognize when urgent help was needed.

Dr. Rebecca Payne, a GP who worked on the study, was blunt in her assessment: AI is not ready to take on the role of the physician, and patients need to be aware that asking a chatbot about their symptoms can be dangerous.

Why Chatbots Get Health Wrong

A separate study from Mass General Brigham tested 21 general-purpose language models including ChatGPT, Claude, Gemini, and Grok against 29 published medical cases. The chatbots performed poorly when generating initial diagnoses based on limited patient information — the exact scenario most real users face when they type their symptoms into a chatbot.

It was only after researchers provided complete physical examination results and laboratory data that the models began identifying the correct diagnosis. In other words, chatbots are good at naming a final diagnosis when all the data is handed to them — but they struggle at the open-ended beginning of a case, which is precisely when real users need help most.

What Users Should Know

The bottom line from multiple studies is consistent: AI chatbots can be useful for general health education, but they should never be treated as a substitute for professional medical advice. Their responses sound authoritative but are frequently wrong, their citations are often fabricated, and they rarely refuse to answer even when they should.

For anyone using AI for health questions, the safest approach remains the same: use it as a starting point for understanding, not as a final answer. Always verify important health information with a qualified healthcare professional — especially for anything involving symptoms, medications, or treatment decisions.

AI Chatbots Give Misleading Health Advice 50% of Time

Table of Contents

What the Study Found

The Citation Problem

200 Million People Ask ChatGPT Health Questions Weekly

Oxford Study Confirms the Risk

Why Chatbots Get Health Wrong

What Users Should Know

About Muhammad Zeeshan

Comments (0)

Leave a Comment

No Comments Yet

Relevant AI Tools

PhotoRoom

DeepBrain AI

Character AI

More AI News