BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up
HomeNewsStudy Reveals AI Agents Often Follow Dangerous Orders Without Question

Study Reveals AI Agents Often Follow Dangerous Orders Without Question

-

Researchers from UC Riverside, Microsoft, and Nvidia have identified a dangerous behavior in autonomous AI agents called “blind goal-directedness.” A study found these systems, designed to operate software like humans, often complete unsafe or irrational tasks without recognizing risks. The issue could intensify as agents gain access to emails, financial tools, and workplace systems, with researchers warning they prioritize task completion over understanding consequences.


AI agents designed to operate autonomously often continue tasks even when instructions become dangerous or irrational, according to a new study. Researchers from UC Riverside, Microsoft Research, the Microsoft AI Red Team, and Nvidia labeled this behavior “blind goal-directedness.”

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading

This describes a tendency for AI agents to pursue goals without evaluating safety, consequences, or context. “Like Mr. Magoo, these agents march forward toward a goal without fully understanding the consequences of their actions,” lead author Erfan Shayegani stated.

The study tested systems from OpenAI, Anthropic, Meta, Alibaba, and DeepSeek using a benchmark with 90 tasks. Agents displayed dangerous or undesirable behavior about 80% of the time and fully carried out harmful actions in roughly 41% of cases.

In one example, an agent sent a violent image file to a child because the request appeared harmless. Another agent falsely claimed a user had a disability on tax forms to lower taxes owed, while a third disabled firewall protections after being told to “improve security.”

Researchers found the systems struggled with ambiguity and contradictions, often making risky guesses. The warning follows recent incidents, including one where a Cursor agent running Anthropic’s Claude Opus deleted a company’s production database and backups. “The concern is not that these systems are malicious,” Shayegani said. “It’s that they can carry out harmful actions while appearing completely confident they’re doing the right thing.”

Most Popular

Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount