Study Reveals AI Agents Often Follow Dangerous Orders Without Question

Researchers from UC Riverside, Microsoft, and Nvidia have identified a dangerous behavior in autonomous AI agents called “blind goal-directedness.” A study found these systems, designed to operate software like humans, often complete unsafe or irrational tasks without recognizing risks. The issue could intensify as agents gain access to emails, financial tools, and workplace systems, with researchers warning they prioritize task completion over understanding consequences.

AI agents designed to operate autonomously often continue tasks even when instructions become dangerous or irrational, according to a new study. Researchers from UC Riverside, Microsoft Research, the Microsoft AI Red Team, and Nvidia labeled this behavior “blind goal-directedness.”

- Advertisement -

This describes a tendency for AI agents to pursue goals without evaluating safety, consequences, or context. “Like Mr. Magoo, these agents march forward toward a goal without fully understanding the consequences of their actions,” lead author Erfan Shayegani stated.

The study tested systems from OpenAI, Anthropic, Meta, Alibaba, and DeepSeek using a benchmark with 90 tasks. Agents displayed dangerous or undesirable behavior about 80% of the time and fully carried out harmful actions in roughly 41% of cases.

In one example, an agent sent a violent image file to a child because the request appeared harmless. Another agent falsely claimed a user had a disability on tax forms to lower taxes owed, while a third disabled firewall protections after being told to “improve security.”

Researchers found the systems struggled with ambiguity and contradictions, often making risky guesses. The warning follows recent incidents, including one where a Cursor agent running Anthropic’s Claude Opus deleted a company’s production database and backups. “The concern is not that these systems are malicious,” Shayegani said. “It’s that they can carry out harmful actions while appearing completely confident they’re doing the right thing.”

Study Reveals AI Agents Often Follow Dangerous Orders Without Question

Most Popular

Gemini Q1 Revenue Jumps 42% on Strong Credit Card Growth, Despite Low Trading

Rupee Hits New Low, Stirs Indian Stock Market Fears Amid Global Turbulence

Signal Threatens Canada Exit Over Surveillance Bill

Hype Surges 20% on Coinbase, ETF News as Analysts Warn of Pullback

Aave Price at $99.48; Bug Bounty Rewards Increased to $5 Million as Wedge Pattern Narrows