Physics Explains why and when AI Hallucinations happen

We use AI more than ever. Yet even the people who build it often can’t explain how it really works—or why it sometimes fails spectacularly. But a new study from a physicist may offer a surprising path forward, using the laws of nature to better understand artificial intelligence. At the heart of this effort is Neil Johnson, a Professor of Physics at George Washington University. In April 2025, alongside researcher Frank Yingjie Huo, he released a paper titled Capturing AI’s Attention: Physics of Repetition, AI Hallucination, Bias and Beyond. While the math in the paper is dense, the goal is simple: understand why AI models hallucinate and offer biased responses—and find a way to predict when that will happen.

Cracking Open AI’s Black Box with First Principles

Johnson applies first-principle physics theory to the transformer Attention mechanism—a core feature of modern large language models (LLMs). This mechanism helps AI decide which parts of a user’s input are most important when crafting its response.

To simplify his analogy, Johnson compares this process to physics concepts known as “spin baths.” In this model, a word (or token) is like a particle (a spin), and its surrounding context is the bath—a collection of other words it can interact with. The Attention mechanism relies on two of these “spin baths,” forming what physicists call a 2-body Hamiltonian—a representation of a system’s total energy.

Here’s where things get interesting: Johnson’s math suggests that this setup—used in every mainstream LLM today—makes AI inherently unstable. It’s prone to bias and hallucinations, especially if the training data is flawed or insufficient. Essentially, the way AI “pays attention” is too simple to guarantee reliability over time. And worse, you can’t always predict when it’ll break.

According to Johnson, the problem isn’t about the morality or correctness of training data. It’s about being able to foresee when that data might trigger unpredictable output. His formula connects bias in the training data directly to unexpected model behavior—like suddenly veering into nonsense or generating harmful responses.

Why We’re Stuck with Flawed AI Models for Now

You might ask: If this approach is flawed, why hasn’t it changed? The answer may lie not in technology, but in history.

Johnson draws a comparison to Britain’s famous “Gauge Wars.” George Stephenson popularized a narrow railway gauge that became the industry standard. Isambard Kingdom Brunel later introduced a broader gauge that was objectively better—smoother, safer, and faster. But Stephenson’s system was already widespread and cheaper to implement, so Brunel’s superior solution was eventually abandoned.

The same logic applies to generative AI today. Billions have been poured into 2-body transformer models. They work well enough to make money. So even if there’s a better approach—like moving to 3-body or even 4-body Hamiltonians—it’s unlikely anyone will rewrite the playbook from scratch.

Predicting AI Failure Points Like an Actuary

Still, Johnson isn’t proposing to throw everything away. Instead, he offers a framework for managing AI risk—much like insurers use actuarial tables to predict health or accident risks.

His theory maps out how often an LLM might “go off the rails,” depending on its training quality. A poorly trained model might hallucinate every 200 words. A better one might last 2,000 words before drifting. This gives developers and users a powerful new tool: the ability to forecast and reduce risk using AI-specific actuarial models.

It’s a step toward transparency and accountability in a world where AI tools increasingly shape decisions in medicine, finance, education, and national security. Johnson’s work may not immediately change how we build AI—but it could change how we use it safely.

In time, understanding how and when LLMs fail might become just as important as knowing what they get right.

Share with others