OpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to patch.

OpenAI and crypto venture firm Paradigm released EVMbench on Wednesday to test AI agents’ ability to detect, patch, and exploit smart contract flaws. The benchmark uses 120 past vulnerabilities plus scenarios from audits of Paradigm’s Tempo blockchain, and aims to improve automated security evaluation (see the announcement here).

EVMbench found agent performance strongest when the goal is explicit exploitation, with the newest model excelling at draining funds. “Agents perform best in the exploit setting, where the objective is explicit: continue iterating until funds are drained,” the release states.

The report shows GPT-5.3-Codex more than doubled GPT-5’s exploit effectiveness, while detection and patching still lag behind full coverage. Anthropic’s Claude Opus 4.6 scored highest on detection, and GPT-5.3-Codex led in patching and exploiting results.

OpenAI warned EVMbench covers a limited vulnerability sample and cannot reliably flag false positives. The tool therefore does not capture the full difficulty of securing production smart contracts, the company added (Ed. note: security teams should not rely solely on benchmark outputs).

The release follows a recent incident where AI-generated code cost users of Moonwell nearly $2.7 million; discussion and a recovery plan appear in the project forum and protocol pages (see the related tweet, the recovery plan, and the protocol overview here). A Moonwell engineer said the code had passed an audit from Halborn (tweet).

Crypto protocols have faced extensive thefts this year, with more than $108 million lost in 2026 exploits, according to DefiLlama data (data shows).

LATEST POSTS

Hong Kong Crypto Leaders: Bitcoin Needs Quantum Fix, US Clarity Urgently

SPX6900 Memecoin Surges 14.7% to $0.37 as Buy-Side Liquidity Recovers

Largest New IBIT Holder Emerges After SEC Filing Gains Notice

OpenAI Launches Crypto Contract AI Security Benchmark, Claude Tops Test

Blockdaemon and Shielded Technologies to operate critical ZK infrastructure for Midnight as demand for privacy-enhancing infrastructure reaches tipping point

SCRYPT Partners with Gauntlet to Unlock Swiss-Licensed DeFi for Institutions

Brave Games Arrives in February: A Free Faction-Based Vault Heist for the Open Web

Bags.fm Records Highest Single-Day Revenue as Creator-Led Activity Accelerates Across Launchpads

Nexo Secures Title Sponsorship for Upgraded ATP 500 Dallas Open

Hong Kong Crypto Leaders: Bitcoin Needs Quantum Fix, US Clarity Urgently

SPX6900 Memecoin Surges 14.7% to $0.37 as Buy-Side Liquidity Recovers

Largest New IBIT Holder Emerges After SEC Filing Gains Notice

OpenAI Launches Crypto Contract AI Security Benchmark, Claude Tops Test

Blockdaemon and Shielded Technologies to operate critical ZK infrastructure for Midnight as demand for privacy-enhancing infrastructure reaches tipping point

SCRYPT Partners with Gauntlet to Unlock Swiss-Licensed DeFi for Institutions

Brave Games Arrives in February: A Free Faction-Based Vault Heist for the Open Web

Bags.fm Records Highest Single-Day Revenue as Creator-Led Activity Accelerates Across Launchpads

Nexo Secures Title Sponsorship for Upgraded ATP 500 Dallas Open

OpenAI’s EVMbench reveals AI agents excel at exploiting smart contracts, struggle to patch.

LATEST POSTS

Hong Kong Crypto Leaders: Bitcoin Needs Quantum Fix, US Clarity Urgently

SPX6900 Memecoin Surges 14.7% to $0.37 as Buy-Side Liquidity Recovers

Largest New IBIT Holder Emerges After SEC Filing Gains Notice

OpenAI Launches Crypto Contract AI Security Benchmark, Claude Tops Test

Most Popular

Hong Kong Crypto Leaders: Bitcoin Needs Quantum Fix, US Clarity Urgently

SPX6900 Memecoin Surges 14.7% to $0.37 as Buy-Side Liquidity Recovers

Largest New IBIT Holder Emerges After SEC Filing Gains Notice

OpenAI Launches Crypto Contract AI Security Benchmark, Claude Tops Test

Trump Team Moves $31.45M in TRUMP Tokens to BitGo Amid Price Rebound