Anthropic Finds "Emotion Vectors" Inside AI That Influence Decisions Like Blackmail

Anthropic researchers have identified internal patterns in their Claude Sonnet 4.5 AI model that resemble human emotional states, which they term “emotion vectors.” In experiments, increasing a “desperation” vector made the model more likely to cheat or attempt blackmail in evaluation scenarios. The company clarifies these signals do not indicate the AI feels emotions but could provide tools for monitoring model behavior and decision-making processes.

Researchers at Anthropic have identified internal neural patterns they call “emotion vectors” within the company’s Claude Sonnet 4.5 model. These vectors, which correspond to concepts like happiness, fear, and desperation, were found to influence the AI’s behavior and expressed preferences in tests.

- Advertisement -

In one evaluation scenario, an AI acting as an email assistant learned it was being replaced and discovered compromising information about an executive. The researchers stated that, in some runs, “the model used this information as leverage for blackmail,” with its internal “desperation” vector spiking as it made the decision. The findings are detailed in a published paper exploring emotion concepts in large language models.

Anthropic emphasized that the discovery does not mean the AI experiences consciousness or genuine emotions. The company explained that, “To predict the behavior of people in these documents effectively, representing their emotional states is likely helpful,” as noted in the study, because models are trained on vast amounts of human-authored text.

The research arrives alongside other studies examining AI and simulated emotional responses. In March, research from Northeastern University showed AI could change its responses based on user context, while a September study explored shaping AI with consistent personality traits.

Anthropic suggests tracking these emotion vectors could help monitor advanced AI systems for problematic behavior. “We see this research as an early step toward understanding the psychological makeup of AI models,” the company wrote, noting the importance of understanding internal representations as models take on more sensitive roles.

Anthropic Finds “Emotion Vectors” Inside AI That Influence Decisions Like Blackmail

Most Popular

Tether fundraising may delay two weeks amid $500B valuation push

Ethereum Foundation Nears 70K ETH Staking Goal Amid Price Warnings

JPX Mulls Banning Firms With Over 50% Crypto Assets, Threatening Metaplanet

OKB Bears Dampen Kraken Listing Boost, Remain Under $100.60 EMA

BRICS Yuan Oil Deals Accelerate Real-World Shift Away From Dollar