Why Language Models Hallucinate?
Language models are becoming more powerful, but one persistent flaw keeps resurfacing—hallucinations. These occur when models generate fluent and confident responses that are factually incorrect. It’s a problem not just for chatbot users, but also for developers aiming to create trustworthy AI. In a recent research paper, OpenAI explains why hallucinations happen and what could be done to reduce them. It turns out the problem isn’t just in the models—it’s also in how we train and evaluate them.
What Are Hallucinations in AI?
A hallucination is when a model makes up information that sounds plausible but is false. These mistakes aren’t obvious syntax or grammar errors. They’re confident claims about facts—like a made-up birthdate or a fake publication title—that don't reflect any verified knowledge.
Even straightforward questions can trigger hallucinations. For example, asking for the title of a real researcher’s PhD dissertation resulted in multiple incorrect answers—all presented confidently by the model.
The Root Cause: Training and Evaluation Incentives
At the core of the hallucination problem lies a design flaw in how models are trained and evaluated.
Language models are often evaluated on their accuracy—how often their answers match the correct one. But there’s a hidden issue. If a model doesn’t know the answer to a question, it’s penalized equally whether it says “I don’t know” or makes a wrong guess. This creates a strong incentive to guess.
Think of it like a multiple-choice test. If you’re unsure of the answer, guessing gives you a shot at scoring a point. Leaving it blank guarantees a zero. Over thousands of questions, a model that guesses will likely score higher—despite being wrong more often—than one that admits when it doesn’t know.
This behavior is encouraged by traditional benchmarks. Most evaluations reward only right answers and ignore the cost of confident mistakes. That’s a major reason why models continue to hallucinate even as they improve in other areas.
A Case in Point: Accuracy vs. Honesty
To illustrate the problem, OpenAI compared two models on a test called SimpleQA. One newer model had a high abstention rate—choosing not to answer when unsure—but made far fewer errors. An older model guessed more, gave fewer “I don’t know” answers, and appeared more accurate on paper. But it had nearly triple the error rate.
Here’s what happened:
Metric | New Model | Old Model |
---|---|---|
Abstention Rate | 52% | 1% |
Accuracy | 22% | 24% |
Error Rate | 26% | 75% |
Despite scoring slightly lower on accuracy, the new model made fewer wrong claims. That trade-off matters a lot when the goal is reliable information.
The Role of Pretraining
Hallucinations aren’t random glitches—they’re baked into how language models learn during pretraining.
When training begins, a model reads massive amounts of text and tries to predict the next word. But it doesn’t know which statements are factually correct or not. It sees only examples of what people have written—not labeled truth or falsehood.
This leads to a key problem: models get very good at sounding right, but not necessarily being right. Fluent language patterns are easier to learn than obscure facts. That’s why models make fewer spelling or formatting mistakes but still hallucinate facts.
Some kinds of information—like a public figure’s birthday—aren’t repeated often enough to learn with high certainty. So when asked, the model might guess based on patterns seen elsewhere, leading to hallucinations.
Rethinking Evaluations
The paper argues for a better solution: rework the scoring systems.
Instead of rewarding only correct answers, evaluations should:
- Penalize confident errors more heavily
- Reward appropriate expressions of uncertainty
- Offer partial credit when the model admits it doesn’t know
This change would shift the incentives away from guessing and toward calibrated behavior. Rather than building models that look smart, we could build models that know when they aren’t sure.
Misconceptions Debunked
The research also clears up some common misunderstandings:
- “Bigger models won’t hallucinate.” Not true. Bigger models can hallucinate more because they’re better at guessing fluently.
- “Hallucinations are inevitable.” Also not true. Models can reduce errors by refusing to guess when uncertain.
- “A high accuracy score means no hallucinations.” Accuracy alone can’t capture the cost of wrong but confident answers.
Final Thoughts
Hallucinations don’t come from ignorance—they come from incentives. As long as evaluations reward guessing, language models will keep making confident errors. Fixing hallucinations requires not just smarter models, but smarter metrics.
So the next time you see a chatbot confidently inventing a birthday or publication title, remember: it’s playing the game it was trained to win. If we want better answers, we need to change the rules of the game.