A peer-reviewed study found that nearly half of all health and medical answers from popular AI chatbots are wrong, misleading, or dangerously incomplete. Researchers testing Gemini, DeepSeek, Meta AI, ChatGPT, and Grok on 250 health questions determined that 49.6% of responses were problematic. The findings highlight significant risks as these models are deployed at scale and rarely refuse to answer.
Nearly half of the health answers provided by major AI chatbots are wrong, misleading, or dangerously incomplete, according to a peer-reviewed study. Researchers from UCLA, the University of Alberta, and Wake Forest found 49.6% of responses to 250 questions were problematic, with 19.6% rated “highly problematic.” The team used an adversarial approach, deliberately phrasing questions to push chatbots toward bad advice.
Performance varied significantly by chatbot and topic. Grok was the worst performer, with 58% of its responses rated problematic and 30% highly problematic. Nutrition and athletic performance questions fared the worst statistically across all models, while vaccines and cancer topics performed best. The researchers connected Grok’s poor performance directly to its training data from X, a platform known for spreading health misinformation.
Citations provided by the chatbots were a separate issue, with the median completeness score for references at just 40%. “They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments,” the authors wrote. No chatbot produced a fully accurate reference list, with models hallucinating authors, journals, and titles.
All chatbot responses scored in the “Difficult” range for readability, exceeding medical recommendations for patient education materials. The findings echo a separate Oxford study from February 2026 that found AI medical advice no better than traditional self-diagnosis methods. “As the use of AI chatbots continues to expand, our data highlight a need for public education, professional training, and regulatory oversight,” the study concluded.
