A battle for the future of copyright

The outcome of NYT's lawsuit against OpenAI holds the key to the fate of human content creators, with ramifications for the entire publishing industry

artificial intelligence machine learning
Illustration: AJAY MOHANTY
Prosenjit Datta
5 min read Last Updated : Jan 01 2024 | 10:16 PM IST
Just before 2023 ended, The New York Times (NYT) shocked the generative Artificial Intelligence (GenAI) evangelists of Silicon Valley by filing a detailed and lengthy lawsuit against OpenAI and its primary partner, Microsoft, for copyright violations. It accused OpenAI of unauthorised use of millions of the newspaper’s deeply reported articles to train ChatGPT, the GenAI model promising to upend entire industries, from pharmaceuticals to movies — and especially publishing. NYT pointed out that ChatGPT was making money using NYT articles and directly affecting the publishers revenues without paying a penny.

The meticulously detailed NYT complaint contained hundreds of ChatGPT outputs that were almost verbatim copies of NYT articles — without any attribution. The newspaper said that it had reached out to OpenAI for a licensing deal but had failed to reach any agreement, which is why it had to resort to the lawsuit.

How the US courts decide on the NYT vs OpenAI and Microsoft case will have huge ramifications, not just on the development of future GenAI models but also on the future revenue streams of publishers.

In a piece titled “Generative AI’s Achilles Heel” (https://shorturl.at/bCIOW), this columnist had pointed out that the GenAI models needed high-quality content and data to train on — in order to perform the magic they do. The better the data, the more effective the training, and consequently, the better the output.

All GenAI products — OpenAI’s ChatGPT and DALL-E; Google’s Bard and Gemini; Meta’s Llama; Anthropic’s Claude, and others — primarily scraped the Internet for all the available data (including copyrighted data and content) and trained on them. They argued that this fell under “fair use” and also that GenAI models are no different from humans in the sense that they learn from the pre-existing work of authors and artists to produce new content — whether text, images, or videos.

Obviously, content creators — authors and artists — have not bought into this argument of GenAI companies. A group of prominent authors — from John Grisham to Michael Connelly and David Baldacci — filed lawsuits against multiple GenAI companies. In response, Google has offered a tool that allows publishers and content creators to agree to their content being available to its search engine crawler — while opting out of the content being used for training GenAI models.

But these cases lacked the detailed evidence of plagiarism that NYT has included. The egregious examples of plagiarism included in the NYT suit sets it apart from earlier lawsuits by content creators. NYT also points out that the weightage assigned to NYT content while training was much higher than it was for other content. Meanwhile, a growing number of artists are suing Midjourney, a popular GenAI image maker for plagiarism too.

The NYT lawsuit has elicited a familiar response from the GenAI fraternity. The first line of defence offered on social media platforms by them was that NYT had gamed the queries to make ChatGPT output plagiarised copies. And that this could be easily rectified and NYT would have a hard time proving in court that ChatGPT (and others) would throw up entire NYT articles without attribution. This, however, was easily disproved by multiple users who showed how all GenAI models copied and plagiarised content (from articles to images) without attribution and without much effort to create special prompts.

The second defence offered was that ChatGPT (and other GenAI models) were merely tools. Would one sue a photocopier maker for photocopying copyright-protected books and images? Should one not file a case against the user instead for plagiarising stuff using the tool?

The problem with this line of argument is that if ChatGPT is merely a more sophisticated photocopier, the hype surrounding it is entirely unnecessary —as is the soaring valuation of OpenAI and others.

The final argument offered is that, after all, humans also learn from works of authors and artists — and develop on them. But humans who reproduce exact copies of copyrighted materials would face lawsuits — and therefore so should GenAI, which tries to replicate and better human creativity and content generation quality.

Of course, the NYT lawsuit will be fought bitterly. While NYT has hired one of the best law firms that has vast experience in tech and copyright cases to represent it, Microsoft has deep pockets too. It is unlikely it will give up on its OpenAI partnership to which it has already committed tens of billions of dollars. The way the judges decide will determine the economics of the two industries. If NYT wins, then GenAI companies may be forced to pay for using good content to their creators. This could change the outlook of the publishing industry — at least the big players who create good content.

It would also lead to making the training and deployment of GenAI models more expensive. GenAI models are not cheap to train and operate even otherwise — they need immense computing power and consume humungous quantities of electricity and water. On the other hand, if the ruling is in favour of Big Tech, it would be a huge disincentive for human content creators.

Existing copyright laws around the world had never anticipated the questions that would arise if anything like GenAI was ever developed. Judges are now being forced to use age-old intellectual property laws for deciding precisely on these matters

The writer is former editor of Business Today and Businessworld, and founder of Prosaic View, an editorial consultancy

More From This Section

Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper

Topics :Artificial intelligenceBS OpinionTechnology

Next Story