OpenAI’s news blues
Happy New Year! Today, let’s talk about one of the biggest pieces of news from the break — and explore why the days of freely training large language models on every public website are likely over for good.
The news, of course, is that the New York Times is suing OpenAI. Here are Michael M. Grynbaum and Ryan Mac:
The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.
First, a reminder / disclosure: I co-host the Hard Fork podcast for the Times. As such, I want to be clear any opinions here are my own, and in fact I’m not going to speak to the merits of the case at all — other than to say that I think the complaint makes for compelling reading. (Particularly the portions alleging that ChatGPT regurgitated long sections of Times articles verbatim after being prompted.)
OpenAI, for its part, told the Times reporters that it had been in negotiations with the paper and was “surprised and disappointed” by the lawsuit.
I’m interested in the case because I think generative AI has the potential to reshape the economics of journalism and the web to favor the builders of AI models over digital publishers. Already I find myself regularly looking up non-critical information via AI chatbot rather than Google, a habit that is generally faster than the alternatives but also deprives publishers of the advertising revenue they might otherwise get from me visiting their websites.
The Times case is important because it tests the legality of these fast-growing services on copyright grounds. The question is whether the specific ways that LLMs process data will be found to be covered by fair use — an as-yet unsettled
...This excerpt is provided for preview purposes. Full article content is available on the original publication.
