The social network Bluesky recently published a proposal on GitHub describing new options that could give users the ability to indicate whether they want their posts and data deleted for purposes such as generative AI training and public archiving.
CEO Jay Graber discussed this proposal earlier in the week on stage at South by Southwest, but it gained new attention on Friday night after she posted about it on Bluesky. Some users reacted with alarm to the company’s plans, which they saw as a backslide from Bluesky’s previous assurances that it would not sell user data to advertisers or train artificial intelligence on user posts.
“Oh, hell no!” – wrote a Sketchette user. “The beauty of this platform was that it did NOT share information. Especially the AI gene. Don’t give up now.”
Graber replied that companies engaged in generative AI are “already scraping public data from all over the Internet,” including Bluesky, because “everything on Bluesky is public, like a public website.” Therefore, she said, Bluesky is trying to create a “new standard” to regulate this scraping, similar to the robots.txt file that websites use to communicate their permissions to web crawlers.
The debate over artificial intelligence training and copyright has drawn attention to robots.txt, among other things, by highlighting the fact that it is not legally binding. Bluesky frames its proposed standard as having a similar “mechanism and expectations” by providing “a machine-readable format that good actors are expected to follow and that carries ethical weight but is not legally binding.”
According to the proposal, users of the Bluesky app or other apps that use the underlying ATProtocol can go into their settings and allow or disallow the use of their Bluesky data in four categories: generative AI, protocol connectivity (i.e. connecting different social ecosystems), large data sets, and web archiving (e.g., the Internet Archive’s Wayback Machine).
If a user indicates that they do not want their data to be used to train generative AI, the proposal states: “Companies and research groups that create AI training sets are expected to respect this intent when they see it, either when crawling websites or when bulk data is transferred using the protocol itself.”