|AI|LLM|COGNITIVE DECLINE|SAFETY|

Unplug Your AI: Junk Internet Data Is Rotting LLM Brains

Your Model Is What It Eats: What Happens When LLMs Train on the Worst Parts of the Internet

9 min read1 day ago

This study introduces the “LLM Brain Rot” hypothesis, showing that continual pretraining on low-quality Twitter/X data causes measurable cognitive decline in large language models. Junk data reduces reasoning, long-context ability, and safety, driven by thought-skipping and persistent representational drift. Results highlight data quality as a key factor in AI reliability and the need for routine model health checks. — image generated by the author using AI

In internet culture, the term brain rot refers to the detrimental effect on human cognition of consuming a large volume of online content (especially from social media). Internet addiction appears to have a significant effect on human cognition, especially on attention span (reduced ability to maintain focus when reading and solving problems), memory processes (alterations in how an individual stores, retrieves, and prioritizes knowledge), and social cognition (modification of self-concepts and influencing self-esteem).

Since LLMs learn from enormous amounts of data that have a large amount of internet content, can they also experience brain rot?

Given their training with enormous amounts of tokens, LLMs are exposed to a huge amount of junk data, just like humans. Although LLMs do not have “neurons” and “cerebral cortex” like humans, they have both parameters and attention mechanisms that might analogously be ‘overfitted’ or “distracted” by certain data patterns.

Good article

AI document hygiene is important, you can build models for hygiene checkers by examining problem documents using another AI, once you have an understanding of the wonky logic inserts or mythic framing tricks or whatever you find, that can be made into a functioning safeguard layer for an AI

Level Up Coding

|AI|LLM|COGNITIVE DECLINE|SAFETY|

Unplug Your AI: Junk Internet Data Is Rotting LLM Brains

Your Model Is What It Eats: What Happens When LLMs Train on the Worst Parts of the Internet

Create an account to read the full story.

Published in Level Up Coding

Written by Salvatore Raieli

Responses (6)

More from Salvatore Raieli and Level Up Coding

Lie to Win: When Competition Makes AI Deceptive

Inside the race to the bottom: why competitive pressures push AI toward deception.

Building an Agentic Deep-Thinking RAG Pipeline to Solve Complex Queries

Planning, Retrieval, Reflection, Critique, Synthesis and more

Building a Training Architecture for Self-Improving AI Agents

RL Algorithms, Policy Modeling, Distributed Training and more.

The Limits of Embeddings: Why One Vector Can’t Fit All Queries

Exploring the theoretical and practical limits of embedding vector retrieval

Recommended from Medium

A New DeepSeek Moment?

Alibaba’s Aegaeon has strong implications

Kimi K2 Just Killed OpenAI, & Claude

If one thing tech bros are better than at building AI, it is creating hype around AI. And the release of Kimi K2 just proved the opposite…

Production-Ready AI Agents: 8 Patterns That Actually Work (with Real Examples from Bank of America…

Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production…

Our AI Has a Perfect Memory and Never Hallucinates. Here’s How We Built It.

This is Part 3 of our “Building in Public” series. We’ve talked about our architecture and our stack. Now, we’re opening up the brain of…

Grounding LLMs: The Knowledge Graph foundation every AI project needs

“Mr. Schwartz, I’ve reviewed your opposition brief,” Federal Judge P. Kevin Castel began, his tone measured but pointed. “You cite six…

It Took Me 6 Years to Find the Best Metric for Classification Models

How I realized that the best calibration metric is none of the ones you’d expect (ROC, Log-loss, Brier score, etc.)