LLMs are not databases, stop treating them like one
"GPT-4 told me the wrong GDP of France!" "Claude hallucinated a function that doesn't exist!" "The AI made up a quote from a research paper!"
Yes. That's what they do. That's what they've always done. That's what they'll always do. Because LLMs are not databases—they're dream machines that happen to dream mostly accurate things.
The fundamental misunderstanding
People think LLMs work like this:
- Training data goes in
- Data gets stored somewhere
- When you ask a question, the model retrieves the relevant data
- You get your answer
What actually happens:
- Training data goes in
- The model learns patterns about how language works
- When you ask a question, the model generates text that pattern-matches to its training
- You get something that looks like an answer
The difference is everything. LLMs don't "know" facts—they know patterns. They don't retrieve information—they generate plausible text. They're not Wikipedia—they're improvisational actors who've read Wikipedia.
Why this matters more than you think
I consulted for a startup that built their entire product on the assumption that GPT-4 could be their database. "It knows everything about companies," they said. "We'll just prompt it for data."
Three months and $50K in API costs later, they discovered:
- The same prompt returned different revenue numbers for the same company
- It confidently invented subsidiaries that didn't exist
- It mixed up data between similar companies
- When pressed for sources, it made those up too
They rebuilt with an actual database. Should have started there.
The hallucination feature, not bug
Hallucination isn't a temporary problem that will be "fixed" in the next version. It's fundamental to how these models work. They're probabilistic text generators, not deterministic retrieval systems.
When an LLM tells you that the Boeing 747 has a cruise speed of 570 mph, it's not because it looked that up in a table. It's because, given all the text it's seen, 570 mph is a statistically plausible cruise speed for a large commercial aircraft. Sometimes it's right. Sometimes it's 550 mph. Sometimes it's 600 mph. The model doesn't know the difference—they're all plausible outputs given the patterns it learned.
This is why temperature settings exist. We're literally controlling how much the model is allowed to dream.
What people keep getting wrong
"Just fine-tune it on your data." Fine-tuning teaches style and patterns, not facts. Fine-tune on your company's documentation and the model will sound like your documentation while making up equally convincing nonsense.
"Use retrieval-augmented generation." RAG helps, but now you're not using the LLM as a database—you're using an actual database and having the LLM format the results. Which is what you should have done from the start.
"Prompt engineering will fix it." You can prompt engineer better hallucinations, not eliminate them. "Be factual" is like telling a method actor to "be yourself"—they'll give you their interpretation of factual.
"Larger models are more accurate." Larger models are more convincing when they're wrong. GPT-4 hallucinates less than GPT-3, but when it does, it's much harder to detect because the hallucinations are more plausible.
What LLMs are actually good at
LLMs excel when exactness doesn't matter:
- Transforming formats (JSON to SQL, markdown to HTML)
- Generating examples and templates
- Explaining concepts in different ways
- Finding patterns and similarities
- Creative tasks where "wrong" isn't really wrong
They're terrible when exactness is critical:
- Specific numbers, dates, or quantities
- Legal or medical facts
- Real-time information
- Anything requiring citation
- Mathematical computation (yes, even GPT-4)
The path forward
Stop trying to make LLMs into databases. Instead:
Use real databases for facts. Store your ground truth somewhere deterministic. Use LLMs to query and format, not to store.
Embrace the probabilistic nature. LLMs are great for "what might this be?" and terrible for "what exactly is this?"
Build verification layers. If an LLM generates a fact, verify it against a source of truth. If it generates code, run tests. If it generates SQL, validate the syntax.
Design for uncertainty. Show confidence scores. Provide multiple options. Make it easy for users to verify and correct.
The mental model that works
Think of LLMs like you'd think of a brilliant intern who read everything on the internet but didn't take notes. They can:
- Give you remarkably good first drafts
- Point you in the right direction
- Explain complex topics simply
- Generate creative solutions
But you wouldn't trust them to:
- Quote exact figures from memory
- Recall specific dates without checking
- Provide legal advice without supervision
- Be your only source of truth
The future isn't retrieval
The sooner we stop trying to turn LLMs into databases, the sooner we can build products that actually work. LLMs are reasoning engines, not storage systems. They're pattern matchers, not fact retrievers. They're creative partners, not source of truth.
Use them for what they're good at. Use databases for what databases are good at. Stop being surprised when your dream machine dreams things that aren't real.
That's not a bug. That's the whole point.