35% of New Websites Now AI-Generated; Study Highlights Semantic Narrowing

A new study from Stanford University, Imperial College London, and the Internet Archive reveals 35% of newly published websites were AI-generated or AI-assisted by mid-2025, up from near zero before ChatGPT’s 2022 launch. The research confirms two measurable impacts: reduced semantic diversity and artificially positive sentiment. It found no data supporting common fears about increased misinformation or stylistic homogeneity, but warns the high prevalence of AI content elevates the empirical risk of model collapse for future AI training.

Research from Stanford University, Imperial College London, and the Internet Archive quantifies the rapid AI takeover of the web. The study found 35% of new websites were AI-generated or assisted by mid-2025, a figure that was essentially zero before November 2022.

- Advertisement -

Jonáš Doležal, a researcher at Imperial College London and co-author, told 404 Media that the speed is staggering. “After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years,” he stated.

The study tested six common hypotheses about AI’s impact on the internet. Only two were substantiated by the data, which showed AI content is less semantically diverse and more artificially positive.

AI-generated sites showed pairwise semantic similarity scores 33% higher than human-written ones. The paper suggests language models may be narrowing the online Overton window by optimizing for outputs close to their training distribution.

AI content also exhibited positive sentiment scores more than 107% higher than human content. Researchers linked this to large language models’ sycophantic tendencies, trained on human approval signals to produce sanitized and relentlessly upbeat text.

Despite widespread public belief, the data showed no meaningful correlation between AI prevalence and factual error rates. The study also found no statistically significant increase in stylistic homogeneity tied to AI content.

The 35% prevalence shifts the theoretical risk of model collapse from an academic concern to an empirical one. Future foundation models trained on contemporary web data will ingest content that is substantially AI-generated and less diverse.

The team is now working with the Internet Archive to develop a continuous monitoring tool. This would track AI’s share of the web in real time rather than as a single snapshot.

35% of New Websites Now AI-Generated; Study Highlights Semantic Narrowing

Most Popular

SUI Network Faces Triple Outage; Analysts See Potential $18 Price Target

Druckenmiller Sells $153M Alphabet Stake, Cites Valuation and Shifts to AI Hardware

MicroStrategy Sells Bitcoin After 2-Year Hiatus to Fund Stock Dividends

Bitcoin Tests $73K Support as Bears Target $70K Amid Bearish Momentum

Cardano Community Rejects $2M Summit Bid, Prioritizing Treasury Discipline