Open AI and Google trained AI models on YouTube videos

The two tech giants transcribed YouTube videos, which may violate creator copyrights.
By
Elena Cavender
 on 
A phone screen displaying the YouTube logo mirrored.
Tech companies are desperate to harvest as much data as possible to train their AI models. Credit: SOPA Images / Contributor / Lightrocket via Getty Images

Both OpenAI and Google turned to transcribing YouTube videos to further train their AI models, which may violate creators' copyrights, the New York Times reports. The report details how the two tech giants, along with Meta, cut corners to access as much data as possible to train their AI models.

According to the report, OpenAI used Whisper, a speech recognition tool, to transcribe more than one million hours of YouTube videos. It then fed the transcripts into GPT-4, the powerful AI system that the latest model of ChatGPT's chatbot runs on. Google, which owns YouTube, also transcribed YouTube videos to train its AI models.

The transcription of videos by both companies may infringe on creator's copyrights to their videos. Other uses of creator content to train AI has prompted copyright and licensing lawsuits.

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

OpenAI's use of YouTube videos also may violate Google's rules, which prohibits the use of its videos for "independent" applications and "automated means (such as robots, botnets or scrapers)" of accessing its videos.

Matt Bryant, a spokesperson for Google, told the New York Times that the company was unaware of any such use by OpenAI. But the report alleges that people at Google knew about OpenAI's unauthorized use of YouTube videos and neglected to take action because it was doing the same thing. Google also told the paper that it only trains its AI on videos from creators who have agreed for their content to be used in this manner.

In July 2023, Google changed its terms of service to allow the use public online material like Google Docs and Google Maps restaurant reviews to further train its AI models.

Mashable Image
Elena Cavender

Elena is a tech reporter and the resident Gen Z expert at Mashable. She covers TikTok and digital trends. She recently graduated from UC Berkeley with a BA in American History. Email her at [email protected] or follow her @ecaviar_.


Recommended For You
OpenAI, Microsoft, Trump admin claim DeepSeek trained AI off stolen data
DeepSeek and OpenAI logos


YouTube is changing what a 'view' means for YouTube Shorts
YouTube Shorts logo

LinkedIn hit with lawsuit alleging private messages were used to train AI models
LinkedIn app on a smartphone screen

Score Soundcore V20i open-ear headphones at their lowest-ever price
black soundcore v20i by anker open-ear headphones against a blue and purple gradient background

Trending on Mashable
NYT Connections hints today: Clues, answers for April 10, 2025
Connections game on a smartphone

Wordle today: Answer, hints for April 10, 2025
Wordle game on a smartphone

'Black Mirror' fans, be warned: DO NOT start with 'Common People'
Chris O'Dowd and Rashida Jones star in "Black Mirror: Common People."

Dire wolves have been brought back from extinction. What does this mean?
Dire wolves Romulus and Remus next to each other in the snow at five months old.

NYT Mini crossword answers, hints for April 10, 2025
Close-up view of crossword puzzle.
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!