HomeNewsOpenAI, Paradigm Launch EVMbench to Test AI's Role in Smart Contract Security

OpenAI, Paradigm Launch EVMbench to Test AI’s Role in Smart Contract Security

-

Researchers from OpenAI, Paradigm, and OtterSec have developed a new benchmark called EVMbench to evaluate the security capabilities of AI agents in a high-stakes blockchain environment. The tool uses 120 real-world vulnerabilities from 40 projects to test AI in detecting, patching, and exploiting smart contract flaws, revealing significant progress and associated risks.


As smart contracts now manage over $400 billion in assets, security is critically important. Unlike traditional software, blockchain programs are often immutable after deployment, making coding errors permanent financial risks.

To assess artificial intelligence in this environment, researchers from OpenAI, Paradigm, and OtterSec developed EVMbench. This benchmark uses 120 real vulnerabilities from 40 blockchain projects to create a realistic evaluation.

The OpenAI blog post noted, “We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances.” It further added that they are releasing code and tasks to support continued measurement of these capabilities.

While AI can improve auditing, it can also exploit weaknesses. EVMbench tests AI agents in three stages of increasing technical difficulty, representing different levels of security responsibility.

The community has reacted to this development. An X user stated, “This is a watershed moment for smart contract security.” Another user echoed similar sentiments, calling the progress “wild” but “kinda worrying.”

A recent incident highlighted the real-world risks. An exploit involving Claude Opus 4.6 led to losses of nearly $1.78 million after AI helped write vulnerable code that mispriced an asset, triggering liquidations.

EVMbench itself has clear limitations, including a curated dataset of only 120 vulnerabilities and a sandboxed environment that cannot fully replicate real-world blockchain complexity. Recent research also shows that ransomware like DeadLock is now using Polygon smart contracts to hide infrastructure.

LATEST POSTS

Bitcoin Traders Pay 13% Premium for Downside Protection Amid ETF Outflows

Professional traders are paying a 13% premium for downside protection as Bitcoin struggles to hold above $66,000. While traditional assets like stocks and gold remain...

SEC Chair Rejects Price Panic, Says Crypto Focus Is Building Frameworks

The U.S. Securities and Exchange Commission will not intervene in falling cryptocurrency markets, Chair Paul Atkins stated. Speaking at ETHDenver, Atkins and Commissioner Hester Peirce...

Digital Euro Aims for Cheaper, Faster Payments with 2029 Launch Target

The European Central Bank is advancing its digital euro project, with plans for a potential launch by 2029. Officials state the central bank digital currency...

Aave Surpasses $1 Billion in Real-World Asset Deposits

The decentralized lending protocol Aave has become the first to surpass $1 billion in real-world asset deposits. This milestone involves tokenized assets like Treasury bills...

Most Popular

spot_img