Report: World’s Largest AI Dataset Littered w/ Child Porn

(Ken Silva, Headline USA) A massive dataset used to power artificial intelligence models has been removed by the organization that created it due to the discovery of child pornography, according to a Thursday report from 404 Media.

Citing a study from the Stanford Internet Observatory, 404 Media reported that the LAION‐5B dataset includes thousands of illegal images—not including all of the intimate imagery published and gathered non‐consensually.

“If you have downloaded that full dataset for whatever purpose, for training a model for research purposes, then yes, you absolutely have [child porn], unless you took some extraordinary measures to stop it,” David Thiel, lead author of the study and Chief Technologist at the Stanford Internet Observatory told 404 Media.

The LAION-5B machine learning dataset, used by Stable Diffusion and other major AI products, was out of “an abundance of caution,” a LAION-5B spokesperson told 404 Media.

The LAION-5B dataset is used to train the most popular AI generation models currently on the market. It is reportedly made up of more than five billion links to images scraped from the open web, including user-generated social media platforms.

Researchers reportedly think the dataset contained child porn because it indiscriminately collected data throughout the open internet.

“Child abuse material likely got into LAION because the organization compiled the dataset using tools that scrape the web, and CSAM isn’t relegated to the realm of the ‘dark web,’ but proliferates on the open web and on many mainstream platforms,” 404 Media reported.

“In 2022, Facebook made more than 21 million reports of CSAM to the National Center for Missing and Exploited Children (NCMEC) tipline, while Instagram made 5 million reports, and Twitter made 98,050.”

Responding to the misguided notion that a few thousand child porn images won’t affect a dataset of billions, 404 Media said it’s the real-life victims who are affected most.

“[Victims] knowing that their content is in a dataset that’s allowing a machine to create other images—which have learned from their abuse—that’s not something I think anyone would have expected to happen, but it’s clearly not a welcome development,” Dan Sexton, chief technology officer at the UK-based Internet Watch Foundation, told the outlet.

“For any child that’s been abused and their imagery circulated, excluding it anywhere on the internet, including datasets, is massive.”

Ken Silva is a staff writer at Headline USA. Follow him at twitter.com/jd_cashless.

Report: World’s Largest AI Dataset Littered w/ Child Porn

TRENDING NOW

Researchers Discover Gold Can Enhance the Effectiveness of Antibiotics

Report: FBI Informants Aren’t Getting Paid During Government Shutdown

Israeli Forces Kill at Least Three More Palestinians in Gaza Despite Ceasefire

Report: US Bombs Another Boat in the Caribbean and There Are Survivors

John Bolton Makes First Court Appearance

Judge Grants Protective Order against Rep. Cory Mills at Request of Ex-Girlfriend

A 20 Percent Portfolio Allocation to Gold and Silver Is Going Mainstream

Headline Geopolitics: Ray McGovern on America’s Foreign Policy Collapse

TRENDING NOW

Guv. Whitmer Orders Killing of Blind Baby Deer Named ‘Peanut’

LATEST NEWS

SCOOP: Headline USA Obtains Thomas Crooks’s Community College Speeches

Sen. Fetterman Calls for President Trump To Bomb Iran

Nevada Rule Bans Biological Males from Playing in Girls’ Sports

EDITOR PICKS

Report: FBI Informants Aren’t Getting Paid During Government Shutdown

Report: US Bombs Another Boat in the Caribbean and There Are Survivors

John Bolton Makes First Court Appearance

POPULAR CATEGORY

HEADLINE USA • PO BOX 49043 • CHARLOTTE, NC 28277