Report: World’s Largest AI Dataset Littered w/ Child Porn

(Ken Silva, Headline USA) A massive dataset used to power artificial intelligence models has been removed by the organization that created it due to the discovery of child pornography, according to a Thursday report from 404 Media.

Citing a study from the Stanford Internet Observatory, 404 Media reported that the LAION‐5B dataset includes thousands of illegal images—not including all of the intimate imagery published and gathered non‐consensually.

“If you have downloaded that full dataset for whatever purpose, for training a model for research purposes, then yes, you absolutely have [child porn], unless you took some extraordinary measures to stop it,” David Thiel, lead author of the study and Chief Technologist at the Stanford Internet Observatory told 404 Media.

The LAION-5B machine learning dataset, used by Stable Diffusion and other major AI products, was out of “an abundance of caution,” a LAION-5B spokesperson told 404 Media.

The LAION-5B dataset is used to train the most popular AI generation models currently on the market. It is reportedly made up of more than five billion links to images scraped from the open web, including user-generated social media platforms.

Researchers reportedly think the dataset contained child porn because it indiscriminately collected data throughout the open internet.

“Child abuse material likely got into LAION because the organization compiled the dataset using tools that scrape the web, and CSAM isn’t relegated to the realm of the ‘dark web,’ but proliferates on the open web and on many mainstream platforms,” 404 Media reported.

“In 2022, Facebook made more than 21 million reports of CSAM to the National Center for Missing and Exploited Children (NCMEC) tipline, while Instagram made 5 million reports, and Twitter made 98,050.”

Responding to the misguided notion that a few thousand child porn images won’t affect a dataset of billions, 404 Media said it’s the real-life victims who are affected most.

“[Victims] knowing that their content is in a dataset that’s allowing a machine to create other images—which have learned from their abuse—that’s not something I think anyone would have expected to happen, but it’s clearly not a welcome development,” Dan Sexton, chief technology officer at the UK-based Internet Watch Foundation, told the outlet.

“For any child that’s been abused and their imagery circulated, excluding it anywhere on the internet, including datasets, is massive.”

Ken Silva is a staff writer at Headline USA. Follow him at twitter.com/jd_cashless.

Report: World’s Largest AI Dataset Littered w/ Child Porn

TRENDING NOW

Ex-FBI Agent Charged in Jan. 6 Protest Now Working on DOJ Weaponization Task Force

SCOOP: FBI Received Intel about Underground Chinese Bases in America

Mystery Surrounds The Jeffrey Epstein Files After Bondi Claims ‘Tens of Thousands’ of Videos

‘Big, Beautiful Bill’ Narrowly Passes Senate, Faces Tough Crowd in House

WATCH: Trump Tours ‘Alligator Alcatraz’ That’s Assisting in Deportation Efforts

Headline Rewind: Our Best Stories from the Week June 23-29

Federal Prison Guards are Placing Inmates in Restraints for WEEKS at a Time, IG Finds

The U.S. Dollar Is “Unattractive”

TRENDING NOW

U.S. Refueled Israeli Jets Throughout Iran War

LATEST NEWS

SCOOP: Headline USA Obtains Thomas Crooks’s Community College Speeches

Sen. Fetterman Calls for President Trump To Bomb Iran

Nevada Rule Bans Biological Males from Playing in Girls’ Sports

EDITOR PICKS

Ex-FBI Agent Charged in Jan. 6 Protest Now Working on DOJ Weaponization Task Force

SCOOP: FBI Received Intel about Underground Chinese Bases in America

Mystery Surrounds The Jeffrey Epstein Files After Bondi Claims ‘Tens of Thousands’ of Videos

POPULAR CATEGORY

HEADLINE USA • PO BOX 49043 • CHARLOTTE, NC 28277