Creeps: AI Giants Are Training Systems on Pictures of Children Without Consent

Technology July 03, 2024

Bing AI Creator

A recent investigation by Human Rights Watch (HRW) has uncovered a disturbing trend in AI development, where images of children are being used to train artificial intelligence models without consent, potentially exposing them to significant privacy and safety risks.

Ars Technica reports that Human Rights Watch researcher Hye Jung Han has discovered that popular AI datasets, such as LAION-5B, contain links to hundreds of photos of Australian children. These images, scraped from various online sources, are being used to train AI models without the knowledge or consent of the children or their families. The implications of this discovery are far-reaching and raise serious concerns about the privacy and safety of minors in the digital age.

Han’s investigation, which examined less than 0.0001 percent of the 5.85 billion images in the LAION-5B dataset, identified 190 photos of children from all of Australia’s states and territories. This sample size suggests that the actual number of affected children could be significantly higher. The dataset includes images spanning the entirety of childhood, making it possible for AI image generators to create realistic deepfakes of real Australian children.

Perhaps even more alarming is the fact that some of the URLs in the dataset reveal identifying information about the children, including their names and locations. In one instance, Han was able to trace “both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia” from a single photo link. This level of detail puts children at risk of privacy violations and potential safety threats.

The investigation also revealed that even photos protected by stricter privacy settings were not immune to scraping. Han found examples of images from “unlisted” YouTube videos, which should only be accessible to those with a direct link, included in the dataset. This raises questions about the effectiveness of current privacy measures and the responsibility of tech companies in protecting user data.

The use of these images in AI training sets poses unique risks to Australian children, particularly indigenous children who may be more vulnerable to harm. Han’s report highlights that for First Nations peoples, who “restrict the reproduction of photos of deceased people during periods of mourning,” the inclusion of these images in AI datasets could perpetuate cultural harms.

The potential for misuse of this data is significant. Recent incidents in Australia have already demonstrated the dangers, with approximately 50 girls from Melbourne reporting that their social media photos were manipulated using AI to create sexually explicit deepfakes. This underscores the urgent need for stronger protections and regulations surrounding the use of personal data in AI development.

While LAION, the organization behind the dataset, has stated its commitment to removing flagged images, the process appears to be slow. Moreover, removing links from the dataset does not address the fact that AI models have already been trained on these images, nor does it prevent the photos from being used in other AI datasets.