AI’s Memorization Crisis
Large language models don’t “learn”—they copy. And that could change everything for the tech industry.
Editor’s note: This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry.
On Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books.
In fact, when prompted strategically by researchers, Claude delivered the near-complete text of Harry Potter and the Sorcerer’s Stone, The Great Gatsby, 1984, and Frankenstein, in addition to thousands of words from books including The Hunger Games and The Catcher in the Rye. Varying amounts of these books were also reproduced by the other three models. Thirteen books were tested.
This phenomenon has been called “memorization,” and AI companies have long denied that it happens on a large scale. In a 2023 letter to the U.S. Copyright Office, OpenAI said that “models do not store copies of the information that they learn from.” Google similarly told the Copyright Office that “there is no copy of the training data—whether text, images, or other formats—present in the model itself.” Anthropic, Meta, Microsoft, and others have made similar claims. (None of the AI companies mentioned in this article agreed to my requests for interviews.)
The Stanford study proves that there are such copies in AI models, and it is just the latest of several studies to do so. In my own investigations, I’ve found that image-based models can reproduce some of the art and photographs they’re trained on. This may be a massive legal liability for AI companies—one that could potentially cost the industry billions of dollars in copyright-infringement judgments, and lead products to be taken off the market. It also contradicts the basic explanation given by the AI industry for how its technology works.
AI is frequently explained in terms of metaphor; tech companies like to say that their products learn, that LLMs have, for example, developed an understanding of English writing without explicitly being told the rules of English grammar. This new research, along with several other studies from the past two years, undermines that metaphor. AI does not absorb information like a human mind does. Instead, it stores information and accesses it.
In fact, many AI developers use a more technically accurate term when talking about these models: lossy compression. It’s beginning to gain traction outside the industry too. The phrase was recently invoked by a court in Germany that ruled against OpenAI in a case brought by GEMA, a music-licensing organization. GEMA showed that ChatGPT could output close imitations of song lyrics. The judge compared the model to MP3 and JPEG files, which store your music and photos in files that are smaller than the raw, uncompressed originals. When you store a high-quality photo as a JPEG, for example, the result is a somewhat lower-quality photo, in some cases with blurring or visual artifacts added. A lossy-compression algorithm still stores the photo, but it’s an approximation rather than the exact file. It’s called lossy compression because some of the data are lost.
From a technical perspective, this compression process is much like what happens inside AI models, as researchers from several AI companies and universities have explained to me in the past few months. They ingest text and images, and output text and images that approximate those inputs.
But this simple description is less useful to AI companies than the learning metaphor, which has been used to claim that the statistical algorithms known as AI will eventually make novel scientific discoveries, undergo boundless improvement, and recursively train themselves, possibly leading to an “intelligence explosion.” The whole industry is staked on a shaky metaphor.
The problem becomes clear if we look at AI image generators. In September 2022, Emad Mostaque, a co-founder and the then-CEO of Stability AI, explained in a podcast interview how Stable Diffusion, Stability’s image model, was built. “We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can re-create any of those and iterations of those” images, he said.
One of the many experts I spoke with while reporting this article was an independent AI researcher who has studied Stable Diffusion’s ability to reproduce its training images. (I agreed to keep the researcher anonymous, because they fear repercussions from major AI companies.) Above is one example of this ability: On the left is the original from the web—a promotional image from the TV show Garfunkel and Oates—and on the right is a version that Stable Diffusion generated when prompted with a caption the image appears with on the web, which includes some HTML code: “IFC Cancels Garfunkel and Oates.” Using this simple technique, the researcher showed me how to produce near-exact copies of several dozen images known to be in Stable Diffusion’s training set, most of which include visual residue that looks something like lossy compression—the kind of glitchy, fuzzy effect you may notice in your own photos from time to time.
Above is another pair of images taken from a lawsuit against Stability AI and other companies. On the left is an original work by Karla Ortiz, and on the right is a variation from Stable Diffusion. Here, the image is a bit further from the original. Some elements have changed. Instead of compressing at the pixel level, the algorithm appears to be copying and manipulating objects from multiple images, while maintaining a degree of visual continuity.
As companies explain it, AI algorithms extract “concepts” from training data and learn to make original work. But the image on the right is not a product of concepts alone. It’s not a generic image of, say, “an angel with birds.” It’s difficult to pinpoint why any AI model makes any specific mark in an image, but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left. It isn’t collaging in the physical cut-and-paste sense, but it also isn’t learning in the human sense the word implies. The model has no senses or conscious experience through which to make its own aesthetic judgments.