OpenAI CTO dodges questions around training data for text-to-video generator Sora

OpenAI’s Mira Murati concluded her answer by just saying, “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data” 

Updated - March 16, 2024 11:35 am IST

Published - March 16, 2024 10:16 am IST

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant [File]

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant [File] | Photo Credit: Reuters

A video clip from a WSJ interview with OpenAI CTO Mira Murati has gone viral on social media for the wrong reasons. Murati, who sat down earlier in the week with the publication’s Joanna Stern to discuss OpenAI’s new text-to-video tool, Sora, evidently didn’t have a lot of clarity when it came to answering questions about the datasets the tool had been trained on.

When asked what kind of data the company had used in Sora, Murati responded by saying they stuck to “publicly available data and licensed data.”

Stern then went on to specifically ask where this was from. “So, videos on YouTube?”

Murati made a confused expression in response to this, saying she didn’t know.

Stern persisted with the same line of questioning, asking, “Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them.”

Murati replied to this saying she wasn’t “actually sure about that” and if they were publicly available, they might have been but she wasn’t “confident about it.”

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

She concluded her answer by just saying,” I’m not going to go into the details of the data that was used, but it was publicly available or licensed data.”

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant, her refusal to answer the questions clearly, and possible ignorance.

The source of training datasets in AI tools has become a hotbed for legal muddle. Several authors and media publishers have already filed lawsuits against OpenAI for using their writings to train their AI chatbot ChatGPT without permission.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.