BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up
HomeNewsOpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to...

OpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to patch.

-

OpenAI and crypto venture firm Paradigm released EVMbench on Wednesday to test AI agents’ ability to detect, patch, and exploit smart contract flaws. The benchmark uses 120 past vulnerabilities plus scenarios from audits of Paradigm’s Tempo blockchain, and aims to improve automated security evaluation (see the announcement here).

EVMbench found agent performance strongest when the goal is explicit exploitation, with the newest model excelling at draining funds. “Agents perform best in the exploit setting, where the objective is explicit: continue iterating until funds are drained,” the release states.

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading

The report shows GPT-5.3-Codex more than doubled GPT-5’s exploit effectiveness, while detection and patching still lag behind full coverage. Anthropic’s Claude Opus 4.6 scored highest on detection, and GPT-5.3-Codex led in patching and exploiting results.

OpenAI warned EVMbench covers a limited vulnerability sample and cannot reliably flag false positives. The tool therefore does not capture the full difficulty of securing production smart contracts, the company added (Ed. note: security teams should not rely solely on benchmark outputs).

The release follows a recent incident where AI-generated code cost users of Moonwell nearly $2.7 million; discussion and a recovery plan appear in the project forum and protocol pages (see the related tweet, the recovery plan, and the protocol overview here). A Moonwell engineer said the code had passed an audit from Halborn (tweet).

Crypto protocols have faced extensive thefts this year, with more than $108 million lost in 2026 exploits, according to DefiLlama data (data shows).

Most Popular

Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount