A new study has uncovered a pervasive issue in large language models (LLMs) used for code generation. The frequent creation of completely fictitious software packages names. This phenomenon, known as package hallucination, appears to be far more common than previously understood.
Researchers from the University of Texas at San Antonio (UTSA), the University of Oklahoma, and Virginia Tech evaluated 16 widely used code-generating LLMs across two prompt datasets. By generating more than 576,000 code samples in Python and JavaScript, they discovered that hallucinated package names—those referencing non-existent packages—were rampant.
- Open-source models hallucinated packages in 21.7% of responses.
- Commercial models showed a lower, but still concerning, hallucination rate of 5.2%.
“Package hallucinations are a systemic issue across both open and commercial models,” said UTSA computer science researcher Joe Spracklen. “While model improvements may reduce the problem, it’s a persistent risk developers must be cautious about.”
The models collectively produced over 440,000 hallucinated package references, including 205,000+ unique names that never existed in any known repository. Meta’s CodeLlama 7B and 34B were the most prolific offenders, while GPT-4 Turbo had the lowest rate of hallucinations.
The Rise of “Slopsquatting”
These hallucinated packages can be exploited through a new attack vector dubbed slopsquatting. It mirrors typosquatting, where attackers publish malicious packages with names that closely resemble legitimate ones. But in this case, they register entirely fake package names that LLMs hallucinate—names that seem plausible to developers but don’t actually exist.
An attacker can register a hallucinated package name on public repositories like npm or PyPI. Once the LLM suggests it to a developer, the attacker’s malicious version could be unknowingly installed.
“This is essentially phishing for developers,” Spracklen noted. “It’s scalable, nearly cost-free for attackers, and relies heavily on trust and user error.”
Repetition Magnifies Risk of Hallucination
Researchers also found that hallucinations are often repeated by the same models. In an experiment repeating hallucination-prone prompts 10 times, 43% of them reproduced the same fake package name every time. In 58% of cases, the hallucination reappeared at least once.
This repetition increases the likelihood that developers will be misled across multiple use cases, particularly in collaborative or educational environments where prompts are shared.
Ironically, top-performing models were able to detect their own hallucinated outputs—but only after the fact. When researchers asked the model whether the package it had just recommended was real, it correctly identified hallucinated names as invalid more than 80% of the time.
Still, this self-awareness comes too late to prevent misuse, and it highlights a gap between code generation and validation.
Not an Isolated Discovery
This isn’t the first warning about package hallucinations. A 2023 study by Lasso Security showed similar results, with 24.2% of package recommendations being fabricated. To underscore the threat, a researcher uploaded a fake package (invented by an LLM) to Hugging Face—and it was downloaded over 32,000 times.
Had it contained malicious code, thousands of development environments might have been compromised.
As AI-assisted development tools continue gaining traction, addressing package hallucinations becomes critical. Spracklen’s study suggests that even advanced models are susceptible and that developer trust in LLM-generated code must be balanced with rigorous verification practices.
In short, while AI code generators offer speed and convenience, they also introduce subtle new security threats that demand proactive safeguards—from developers, open-source platforms, and AI companies alike.