Wikipedia fights against voracious AI bots

0
160
Wikipedia fights against voracious AI bots

Since January 2024, Wikimedia has seen a 50 percent increase in the bandwidth used to upload multimedia content, the foundation says in its updated report. But it’s not because human readers have suddenly developed a voracious appetite for consuming Wikipedia articles, watching videos, or downloading files from Commons. No, the surge in usage is due to search engine crawlers, or automated programs that scan images, videos, articles, and other openly licensed Wikipedia files to train generative artificial intelligence models.

This sudden increase in traffic from bots can slow down access to Wikimedia pages and resources, especially during high-profile events. For example, when Jimmy Carter passed away in December, people’s increased interest in the video of his presidential debate with Ronald Reagan caused pages to load slowly for some users. Wikimedia is able to handle surges in traffic from readers during such events, and users watching Carter’s video should not have caused any problems. However, “the amount of traffic generated by scraper bots is unprecedented and creates growing risks and costs,” Wikimedia said.

The foundation explained that human readers tend to search for specific and often similar topics. For example, many people search for the same thing when it is trending. Wikimedia creates a cache of frequently requested content in the data center closest to the user, which allows it to serve content faster. However, articles and content that have not been accessed for a long time have to be downloaded from the main data center, which consumes more resources and therefore costs Wikimedia more money. Because AI robots tend to read pages in bulk, they access obscure pages that have to be served from the main data center.

Wikimedia said that, if you look closely, 65% of the resource-intensive traffic it receives comes from bots. This is already causing constant disruptions to the site’s reliability team, which has to constantly block search robots before they significantly slow down access to pages for real readers. The real problem, according to Wikimedia, is that “the expansion has occurred largely without sufficient attribution, which is a key factor in attracting new users to participate in the movement.” A foundation that relies on people’s donations to keep working needs to attract new users and make them care about its cause. “Our content is free, our infrastructure is not,” the foundation said. Wikimedia is now looking to create sustainable ways for developers and repeat users to access its content in the next fiscal year. It has to do this because it sees no signs that AI-related traffic will slow down anytime soon.

LEAVE A REPLY

Please enter your comment!
Please enter your name here