OpenAI and Paradigm have released EVMbench, a framework to benchmark AI agents on detecting, patching, and exploiting Ethereum smart contract vulnerabilities. The tool draws on 120 real-world vulnerabilities, and top models now exploit over 70% of critical bugs, a significant increase from under 20%. Alongside the release, OpenAI expanded a private beta of its Aardvark security agent and committed $10 million in API credits to defensive crypto research.
A new benchmark from OpenAI and Paradigm evaluates AI agents on smart contract security, aiming to curb billions lost to exploits. The EVMbench framework tests agents across three modes: detecting vulnerabilities, patching them, and actively exploiting them.
The benchmark uses 120 curated high-severity vulnerabilities sourced from real-world audits, including from Code4rena contests and the audit of Stripe‘s Tempo payments blockchain. When the project began, top models could exploit fewer than 20% of critical bugs, but that figure is now above 70%.
In a joint statement, Paradigm said “It’s now clear to us that a growing portion of audits in the future will be done by agents.” OpenAI noted measuring performance in “economically relevant environments is critical as models become powerful tools for both attackers and defenders.”
The release signals a major integration between AI and crypto, with OpenAI formally dedicating resources to Ethereum security. The same advanced capability that makes the benchmark a powerful defensive tool also presents a potential threat model if misused.
In broader market news, major cryptocurrencies fell 1-2%, with Bitcoin trading near $66,600. Coinbase announced its Base network is dropping the OP stack for its own unified client, causing a 23% drop in the OP token. Separately, Hyperliquid launched a policy center with $29 million for advocacy, and MegaETH introduced a decentralized exchange unifying spot trading, lending, and perpetual contracts.

