BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up
HomeNewsStudy: Half of AI Chatbot Health Answers Are Problematic, Grok Worst Performer

Study: Half of AI Chatbot Health Answers Are Problematic, Grok Worst Performer

-

A peer-reviewed study found that nearly half of all health and medical answers from popular AI chatbots are wrong, misleading, or dangerously incomplete. Researchers testing Gemini, DeepSeek, Meta AI, ChatGPT, and Grok on 250 health questions determined that 49.6% of responses were problematic. The findings highlight significant risks as these models are deployed at scale and rarely refuse to answer.


Nearly half of the health answers provided by major AI chatbots are wrong, misleading, or dangerously incomplete, according to a peer-reviewed study. Researchers from UCLA, the University of Alberta, and Wake Forest found 49.6% of responses to 250 questions were problematic, with 19.6% rated “highly problematic.” The team used an adversarial approach, deliberately phrasing questions to push chatbots toward bad advice.

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading

Performance varied significantly by chatbot and topic. Grok was the worst performer, with 58% of its responses rated problematic and 30% highly problematic. Nutrition and athletic performance questions fared the worst statistically across all models, while vaccines and cancer topics performed best. The researchers connected Grok’s poor performance directly to its training data from X, a platform known for spreading health misinformation.

Citations provided by the chatbots were a separate issue, with the median completeness score for references at just 40%. “They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments,” the authors wrote. No chatbot produced a fully accurate reference list, with models hallucinating authors, journals, and titles.

All chatbot responses scored in the “Difficult” range for readability, exceeding medical recommendations for patient education materials. The findings echo a separate Oxford study from February 2026 that found AI medical advice no better than traditional self-diagnosis methods. “As the use of AI chatbots continues to expand, our data highlight a need for public education, professional training, and regulatory oversight,” the study concluded.

Most Popular

Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount