BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up
HomeNews35% of New Websites Now AI-Generated; Study Highlights Semantic Narrowing

35% of New Websites Now AI-Generated; Study Highlights Semantic Narrowing

-

A new study from Stanford University, Imperial College London, and the Internet Archive reveals 35% of newly published websites were AI-generated or AI-assisted by mid-2025, up from near zero before ChatGPT’s 2022 launch. The research confirms two measurable impacts: reduced semantic diversity and artificially positive sentiment. It found no data supporting common fears about increased misinformation or stylistic homogeneity, but warns the high prevalence of AI content elevates the empirical risk of model collapse for future AI training.


Research from Stanford University, Imperial College London, and the Internet Archive quantifies the rapid AI takeover of the web. The study found 35% of new websites were AI-generated or assisted by mid-2025, a figure that was essentially zero before November 2022.

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading

Jonáš Doležal, a researcher at Imperial College London and co-author, told 404 Media that the speed is staggering. “After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years,” he stated.

The study tested six common hypotheses about AI’s impact on the internet. Only two were substantiated by the data, which showed AI content is less semantically diverse and more artificially positive.

AI-generated sites showed pairwise semantic similarity scores 33% higher than human-written ones. The paper suggests language models may be narrowing the online Overton window by optimizing for outputs close to their training distribution.

AI content also exhibited positive sentiment scores more than 107% higher than human content. Researchers linked this to large language models’ sycophantic tendencies, trained on human approval signals to produce sanitized and relentlessly upbeat text.

Despite widespread public belief, the data showed no meaningful correlation between AI prevalence and factual error rates. The study also found no statistically significant increase in stylistic homogeneity tied to AI content.

The 35% prevalence shifts the theoretical risk of model collapse from an academic concern to an empirical one. Future foundation models trained on contemporary web data will ingest content that is substantially AI-generated and less diverse.

The team is now working with the Internet Archive to develop a continuous monitoring tool. This would track AI’s share of the web in real time rather than as a single snapshot.

Most Popular

Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount