A recent study from King's College London indicates that Wikipedia's user engagement remains strong despite the widespread adoption of artificial intelligence tools like ChatGPT. However, the research also highlights significant long-term challenges for the online encyclopedia, particularly from large-scale data scraping by AI developers.
Key Takeaways
- A study by King's College London found no decline in Wikipedia's user activity from January 2021 to January 2024.
- Page views and visitor numbers increased across 12 language editions, though growth was slower where ChatGPT was available.
- Researchers identified intensive data scraping by AI companies as a major threat to Wikipedia's infrastructure.
- Experts are calling for a new framework to govern how AI developers use Wikipedia's content for training models.
Study Shows Continued Wikipedia Relevance
Contrary to predictions that AI chatbots would make Wikipedia obsolete, a new analysis published in the ACM Collective Intelligence journal shows the platform is maintaining its user base. The comprehensive study, conducted by researchers at King's College London, provides a data-driven look at how the world's largest online encyclopedia is navigating the new technological landscape.
The research team examined activity across 12 different language editions of Wikipedia over a 36-month period. This timeframe, from January 2021 to January 2024, crucially covers the period before and after the public launch of popular generative AI tools.
Methodology of the Study
To assess the impact of AI, researchers divided the 12 language editions into two groups: six where ChatGPT was officially accessible and six where it was not. This comparative approach allowed them to observe potential differences in user behavior based on the availability of a major AI tool.
User Engagement and Growth Persist
The study's primary finding was that Wikipedia did not experience a drop in activity. In fact, the data revealed a consistent increase in both page views and the number of unique visitors across all language versions analyzed. This suggests that users continue to turn to the platform for information.
However, the researchers noted a nuanced detail in the growth pattern. The increase in traffic was smaller in the language editions where ChatGPT was readily available. This could indicate that while AI is not replacing Wikipedia, it may be absorbing a portion of information-seeking behavior that would have otherwise gone to the encyclopedia.
No Decline in Contributions: The study found no evidence that the rise of ChatGPT has reduced the number of volunteer editors or the volume of edits being made on Wikipedia, a key metric for the platform's health and sustainability.
The authors acknowledged certain limitations in their research. For example, the use of Virtual Private Networks (VPNs) could allow users to bypass geographic restrictions on AI tools, potentially affecting the data. Additionally, the study did not measure the specific popularity of ChatGPT within each country, which could also influence user habits.
The Hidden Threat of AI Data Scraping
While the study's findings on user engagement are positive, it raises serious alarms about a different kind of threat posed by the AI industry: data scraping. This practice involves automated programs systematically collecting vast amounts of information from websites.
AI developers frequently use Wikipedia's high-quality, human-verified content to train their large language models. According to the researchers, this activity is placing an unprecedented strain on Wikipedia's technical infrastructure.
“AI developers are letting their scrapers loose on Wikipedia to train them on high-quality data, pushing up traffic to levels where Wikipedia’s servers are struggling to keep up,” stated Elena Simperl, a professor of computer science at King’s College London and co-director of the King’s Institute for Artificial Intelligence.
This intensive scraping creates a one-sided relationship. AI companies benefit from the free, structured knowledge built by a global community of volunteers, while the non-profit foundation that runs Wikipedia bears the increasing operational costs.
A Call for a New Social Contract
The issue extends beyond server load. Professor Simperl also pointed out that AI-generated content often uses information from Wikipedia without providing proper credit. This practice diverts web traffic and potential donors away from the encyclopedia, threatening its long-term financial model which relies on public support.
Neal Reeves, the study's first author, argued for the establishment of a “new social contract” between AI companies and Wikipedia. Such an agreement would create a more balanced and sustainable relationship. It would allow AI developers to continue using Wikipedia's valuable data while ensuring the platform maintains control over its content and is fairly compensated for its use.
A structured partnership could involve:
- Providing data through controlled APIs instead of resource-intensive scraping.
- Implementing clear attribution requirements for AI-generated answers.
- Financial contributions from AI companies to support Wikipedia's operations.
Wikimedia's Proactive Steps
The Wikimedia Foundation, the organization behind Wikipedia, is already taking steps to address this challenge. Coinciding with the study's publication, Wikimedia Deutschland announced a new initiative called the Wikidata Embedding Project.
This project aims to create a new database specifically designed to make it easier for external users, including AI models, to access Wikipedia's content in a structured and efficient manner. By providing a formal channel for data access, Wikimedia hopes to reduce reliance on disruptive scraping.
This system would allow AI developers to train their models on knowledge that has been verified by Wikipedia's human editors, potentially improving the reliability of AI-generated information while protecting the platform's infrastructure.
The ongoing interaction between Wikipedia and AI represents a critical moment for the future of open information. While users continue to value the encyclopedia, its survival may depend on establishing a more symbiotic relationship with the very technologies it helps to train.





