Friday, December 19, 2025 | 08:39 AM ISTहिंदी में पढें

Home / Technology / Tech News / OpenAI transcribed Google's YouTube videos to train AI models: Report

OpenAI transcribed Google's YouTube videos to train AI models: Report

Google prohibits the use of videos posted on YouTube for applications that are independent of the video platform

Image: OpenAI

Harsh Shivam New Delhi

2 min read Last Updated : Apr 08 2024 | 12:07 PM IST

Listen to This Article

OpenAI reportedly transcribed over one million hours of YouTube videos to collect training data for its advanced GPT-4 model, disregarding the Google-owned platform’s copyright rules. According to a report by The New York Times, Microsoft-backed OpenAI used an indigenous speech recognition tool called Whisper to transcribe audio from YouTube videos to yield conversational text, which was then used to train the AI model that powers ChatGPT.

According to the report, makers of ChatGPT internally discussed on how the use of YouTube data for training might be against the platform’s policy. The company, reportedly, opted to use YouTube videos’ data as it had exhausted the reservoir of publicly available data. The report stated that OpenAI’s president, Greg Brockman, personally assisted in selecting videos for transcription.

Google prohibits the use of videos posted on YouTube for applications that are “independent” of the video platform.

In a statement to The Verge, OpenAI spokesperson, Lindsay Held, said that the company uses “unique” datasets for each of its models to “help their understanding of the world”. She added that the company uses “numerous sources including publicly available data and partnerships for non-public data.”

Commenting on the topic, Google spokesperson, Matt Bryant told The Verge that Google has “seen unconfirmed reports” related to OpenAI using YouTube videos for training AI models. He added that the streaming platform’s “Terms of Service and robots.txt files prohibit unauthorised scraping or downloading of YouTube content.”

Earlier this week, YouTube CEO Neal Mohan in an interview with Bloomberg said that “he has seen reports” related to OpenAI using YouTube videos to train their text-to-video generator Sora. He said that he has no information about the same, but it would be a “clear violation” of the platform’s policies if it did.

According to the report by The New York Times, Google has also used transcribed texts from YouTube videos for training its AI model Gemini. If true, this violates the copyright to the videos, which belongs to the creator who posts the video to the platform. The report stated that Google broadened its terms of service to allow the company to be able to use publicly available Google Docs files, restaurant reviews on Google Maps, and more for training AI models.

View this post on Instagram

A post shared by BSTech (@bstechofficial)

More From This Section

How tech giants cut corners to harvest data for artificial intelligence

First Published: Apr 08 2024 | 12:07 PM IST

Explore News

Stock Market LIVE Updates Stocks to Watch Today Parliament Winter Session LIVE Bharat Taxi App Oneplus 15r Launched Google Android 16 QPR3 Beta 1 Oneplus 15r Review US Visa Bulletin International Travel Insurance Personal Finance

OpenAI transcribed Google's YouTube videos to train AI models: Report

Google prohibits the use of videos posted on YouTube for applications that are independent of the video platform

Listen to This Article

More From This Section

How tech giants cut corners to harvest data for artificial intelligence

Data centre operator Yotta plans capacity booster for AI efforts

Vital signs of change: AI now lifeblood of India's hospital chains

AI can drive efficiency, raises mkt concentration concerns: CCI chief Kaur

Explore News