Children’s personal photos used to train AI models without consent: Report

The report also shared that “information about these children does not appear to exist anywhere else on the Internet,” showing that their families had been particularly cautious to protect the children’s identity online.  

Published - July 03, 2024 04:15 pm IST

FILE PHOTO: Personal images of Australian children were used to train AI models without their consent.

FILE PHOTO: Personal images of Australian children were used to train AI models without their consent. | Photo Credit: The Hindu

Personal images of Australian children were used to train AI models without their consent despite platforms prohibiting web scraping and enforcing strict privacy settings, according to a report by Human Rights Watch (HRW). The weblinks in the dataset even revealed details about the children, including their names and locations where the picture was taken. 

HRW has found about 190 photos of children from across Australia, including indigenous children who may be especially vulnerable, that were used to train the AI model . This follows an earlier report by HRW that said 170 photos of Brazilian children were used to train popular AI dataset called LAION-5B, which was built from Common Crawl snapshots of the public web. 

Researcher Hye Jung Han noted that the 190 photos are only .0001 percent of the 5.85 billion images and captions used to train the AI model. She added that these photos had been scraped “without the knowledge or consent of the children or their families,” and spanned their whole childhood. 

The report also shared that “information about these children does not appear to exist anywhere else on the Internet,” showing that their families had been particularly cautious to protect the children’s identity online.  

In one case, Han found that a YouTube video with two boys had been unlisted so as to not appear in searches but it was still part of the dataset. 

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

A YouTube representative spoke to Ars Technica about the issue saying they have been “clear that the unauthorised scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse.” But given their presence in the dataset, it is likely that tools have already been trained on this content. 

LAION, a nonprofit that builds AI datasets has been working with HRW to clean up flagged images but the process hasn’t been a fast one. 

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.