BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up
HomeNewsOpenAI, Paradigm Launch EVMbench to Test AI's Role in Smart Contract Security

OpenAI, Paradigm Launch EVMbench to Test AI’s Role in Smart Contract Security

-

Researchers from OpenAI, Paradigm, and OtterSec have developed a new benchmark called EVMbench to evaluate the security capabilities of AI agents in a high-stakes blockchain environment. The tool uses 120 real-world vulnerabilities from 40 projects to test AI in detecting, patching, and exploiting smart contract flaws, revealing significant progress and associated risks.


As smart contracts now manage over $400 billion in assets, security is critically important. Unlike traditional software, blockchain programs are often immutable after deployment, making coding errors permanent financial risks.

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading

To assess artificial intelligence in this environment, researchers from OpenAI, Paradigm, and OtterSec developed EVMbench. This benchmark uses 120 real vulnerabilities from 40 blockchain projects to create a realistic evaluation.

The OpenAI blog post noted, “We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances.” It further added that they are releasing code and tasks to support continued measurement of these capabilities.

While AI can improve auditing, it can also exploit weaknesses. EVMbench tests AI agents in three stages of increasing technical difficulty, representing different levels of security responsibility.

The community has reacted to this development. An X user stated, “This is a watershed moment for smart contract security.” Another user echoed similar sentiments, calling the progress “wild” but “kinda worrying.”

A recent incident highlighted the real-world risks. An exploit involving Claude Opus 4.6 led to losses of nearly $1.78 million after AI helped write vulnerable code that mispriced an asset, triggering liquidations.

EVMbench itself has clear limitations, including a curated dataset of only 120 vulnerabilities and a sandboxed environment that cannot fully replicate real-world blockchain complexity. Recent research also shows that ransomware like DeadLock is now using Polygon smart contracts to hide infrastructure.

Most Popular

Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount