The dataset that Stable Diffusion was trained on found more than 1,000 child abuse images

The LAION-5B dataset contains more than 5 billion images and serves as a training base for many neural networks such as Stable Diffusion.

According to a recent study by the Stanford Internet Observatory, the dataset also found thousands of child abuse clips that could contribute to dangerous realistic content in image generators.

A representative of the organization behind LAION-5B said they have a “zero tolerance policy” for illegal content and are temporarily removing the dataset to ensure its safety and republish it.

“This report focuses on the LAION-5B data set as a whole. Stability AI models were trained on a filtered subset of it,” said Stability AI, the UK-based AI startup that funded and popularized Stable Diffusion.

LAION-5B, or a subset of it, was used to create several versions of Stable Diffusion—the new one, Stable Diffusion 2.0, was trained on data that significantly filtered out “dangerous” material, making it much harder for users to create candid images. But Stable Diffusion 1.5 does generate sexual content and is still used online.

The company’s spokesperson also said that Stable Diffusion 1.5 wasn’t released by Stability AI at all, but by Runway, the AI ​​video startup that helped create the original version of Stable Diffusion (which is a bit funny, because when this version was released, Stability AI didn’t even mention Runway , giving himself all the rewards).

“We’ve added filters to catch dangerous queries or dangerous results, and we’ve also invested in content tagging features to help identify images created on our platform. These levels of mitigation make it more difficult for attackers to misuse artificial intelligence,” the company added.

LAION-5B was released in 2022 and uses raw HTML code collected by a California non-profit organization to search for images on the Internet and associate them with descriptions. For months, discussion forums and social media have been rumored to contain illegal images.

“As far as we know, this is the first attempt to actually quantify and validate the concerns,” said David Thiel, chief technologist at the Stanford Internet Observatory.

Researchers have also previously found that generative AI image models can create CSAMs, but by combining two “concepts” such as children and sexual activity. Thiel said the new study shows that these models can generate such illegal images because of some underlying data.

Source: Engadget, Bloomberg

Related Posts

UK to regulate cryptocurrency memes: illegal advertising

Britain’s financial services regulator has issued guidance to financial services companies and social media influencers who create memes about cryptocurrencies and other investments to regulate them amid…

unofficial renders of the Google Pixel 9 and information about the Pixel 9 Pro XL

The whistleblower @OnLeaks and the site 91mobiles presented the renders of the Google Pixel 9 phone. Four images and a 360° video show a black smartphone with…

Embracer to sell Gearbox (Borderlands) to Take-Two (Rockstar and 2K) for $460 million

Embracer continues to sell off assets – the Swedish gaming holding has just confirmed the sale of The Gearbox Entertainment studio to Take-Two Interactive. The sum is…

photo of the new Xbox X console

The eXputer site managed to get a photo of a new modification of the Microsoft Xbox game console. The source reports that it is a white Xbox…

Israel Deploys Massive Facial Recognition Program in Gaza, – The New York Times

The Technology section is powered by Favbet Tech The images are matched against a database of Palestinians with ties to Hamas. According to The New York Times,…

Twitch has banned chest and buttock broadcasts of gameplay

Twitch has updated its community rules and banned the focus of streams on breasts and buttocks. According to the update, starting March 29, “content that focuses on…

Leave a Reply

Your email address will not be published. Required fields are marked *