Did ChatGPT train on data from YouTube?

Allegedly, OpenAI has trained its AI model on the data from YouTube. However, it has not provided any comments.

Did ChatGPT train on data from YouTube?
As per the terms of YouTube services, it prohibits the use of content for anything other than "personal, non-commercial use." (Photo Credits: Reuters)

In another turn of events, it has been alleged that OpenAI, owned by Sam Altman and backed by Microsoft, trained its artificial intelligence (AI) models on YouTube, which is owned by Google. As per a report by The Information, OpenAI “has secretly used data from the site (YouTube) to train some of its artificial intelligence models.” As it is known, YouTube is the single biggest and richest source of imagery, audio, and text transcripts on the web, and this platform may come off as an advantage to Google.

Also Read: OpenAI apparently warned Microsoft to go slow on integrating ChatGPT AI tech into Bing, but it didn’t listen

Reportedly, Google is developing their next large language model, Gemini, and the Google researchers feel that the “value of YouTube hasn’t been lost on OpenAI, either”. However, as per the terms of YouTube, the service prohibits the use of content for anything other than “personal, non-commercial use.” At the same time, it is a known fact in the AI industry that everyone is scraping the web. But this time it is alleged that OpenAI has “scraped” YouTube to train its models, which have created a buzz in the world at the moment.

OpenAI has not yet provided any comment, as per reports. Further, the company has launched new versions of text-generative AI models, namely GPT-3.5 turbo and GPT-4. These new versions have a unique capability called function calling. With the help of this feature, developers can generate chatbots by calling external tools like the ChatGPT plugins that are capable of answering questions.

Reports also suggest that OpenAI’s strongest rival, Google, is pretty much equipped to give it strong competition, especially after its Bard upgrade with a new machine-learning model post-I/O 2023. Further, the tech giant also announced PaLM 2 in May. It is reportedly Google’s state-of-the-art language model that features improved multilingual, reasoning, and coding capabilities. Since then, Google has been investing in bringing out new updates to Bard, especially in regards to problem-solving capabilities.

According to reports, Sundar Pichai, the CEO of Google, emphasised how Google worked towards bringing DeepMind and Google Brain together to build Google DeepMind. This is reportedly using computational resources and building more capable systems safely and responsibly. Pichai noted, “Gemini was created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning.” He further added that it offers “impressive capabilities not seen in prior models.”

Google’s most noted AI chatbot, Bard, did not have a very impressive start. However, according to The Information’s report, Google has allegedly trained Bard using data from OpenAI’s ChatGPT that it has scraped from ShareGPT. However, as reported by the Verge, Google has denied these allegations. But, according to The Information, a former Google AI engineer, Jacob Devlin, left Google to join OpenAI. He reportedly warned Google against using the data from ChatGPT since it would be a violation of OpenAI’s terms.

Also Read: Google has a ChatGPT ‘warning’ for its employees

Reports suggest that Google ceased to use the data and perhaps discarded the training that used the ChatGPT’s data. However, with reports suggesting that OpenAI has allegedly trained its artificial model using data from YouTube, looks like even OpenAI is training its model just like we depend on these YouTube videos to learn every bit of skill.

This article was first uploaded on June sixteen, twenty twenty-three, at thirty-four minutes past four in the afternoon.

/