HomeNewsOpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to...

OpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to patch.

-

OpenAI and crypto venture firm Paradigm released EVMbench on Wednesday to test AI agents’ ability to detect, patch, and exploit smart contract flaws. The benchmark uses 120 past vulnerabilities plus scenarios from audits of Paradigm’s Tempo blockchain, and aims to improve automated security evaluation (see the announcement here).

EVMbench found agent performance strongest when the goal is explicit exploitation, with the newest model excelling at draining funds. “Agents perform best in the exploit setting, where the objective is explicit: continue iterating until funds are drained,” the release states.

The report shows GPT-5.3-Codex more than doubled GPT-5’s exploit effectiveness, while detection and patching still lag behind full coverage. Anthropic’s Claude Opus 4.6 scored highest on detection, and GPT-5.3-Codex led in patching and exploiting results.

OpenAI warned EVMbench covers a limited vulnerability sample and cannot reliably flag false positives. The tool therefore does not capture the full difficulty of securing production smart contracts, the company added (Ed. note: security teams should not rely solely on benchmark outputs).

The release follows a recent incident where AI-generated code cost users of Moonwell nearly $2.7 million; discussion and a recovery plan appear in the project forum and protocol pages (see the related tweet, the recovery plan, and the protocol overview here). A Moonwell engineer said the code had passed an audit from Halborn (tweet).

Crypto protocols have faced extensive thefts this year, with more than $108 million lost in 2026 exploits, according to DefiLlama data (data shows).

LATEST POSTS

Hong Kong Crypto Leaders: Bitcoin Needs Quantum Fix, US Clarity Urgently

At a recent blockchain conference in Hong Kong, industry leaders highlighted urgent technological and regulatory challenges. Executives debated the quantum computing risk to Bitcoin and...

SPX6900 Memecoin Surges 14.7% to $0.37 as Buy-Side Liquidity Recovers

SPX6900 (SPX) surged 14.7% to $0.37, marking a three-week high as buyer momentum returned. The memecoin's volume rose 62% to $19 million, with its price...

Largest New IBIT Holder Emerges After SEC Filing Gains Notice

Hong Kong-based Laurore Ltd. has emerged as the largest new holder of BlackRock's iShares Bitcoin Trust (IBIT) after a recent SEC filing gained attention. The...

OpenAI Launches Crypto Contract AI Security Benchmark, Claude Tops Test

OpenAI introduced a new benchmark to assess AI models in detecting and exploiting vulnerabilities in crypto smart contracts. Developed with Paradigm and OtterSec, EVMbench evaluates...

Most Popular

spot_img