LLMs are not databases, stop treating them like one

November 1, 2024·5 min read

"GPT-4 told me the wrong GDP of France!" "Claude hallucinated a function that doesn't exist!" "The AI made up a quote from a research paper!"

Yes. That's what they do. That's what they've always done. That's what they'll always do. Because LLMs are not databases—they're dream machines that happen to dream mostly accurate things.

The fundamental misunderstanding

People think LLMs work like this:

Training data goes in
Data gets stored somewhere
When you ask a question, the model retrieves the relevant data
You get your answer

What actually happens:

Training data goes in
The model learns patterns about how language works
When you ask a question, the model generates text that pattern-matches to its training
You get something that looks like an answer

The difference is everything. LLMs don't "know" facts—they know patterns. They don't retrieve information—they generate plausible text. They're not Wikipedia—they're improvisational actors who've read Wikipedia.

Why this matters more than you think

I consulted for a startup that built their entire product on the assumption that GPT-4 could be their database. "It knows everything about companies," they said. "We'll just prompt it for data."

Three months and $50K in API costs later, they discovered:

The same prompt returned different revenue numbers for the same company
It confidently invented subsidiaries that didn't exist
It mixed up data between similar companies
When pressed for sources, it made those up too

They rebuilt with an actual database. Should have started there.

The hallucination feature, not bug

Hallucination isn't a temporary problem that will be "fixed" in the next version. It's fundamental to how these models work. They're probabilistic text generators, not deterministic retrieval systems.

When an LLM tells you that the Boeing 747 has a cruise speed of 570 mph, it's not because it looked that up in a table. It's because, given all the text it's seen, 570 mph is a statistically plausible cruise speed for a large commercial aircraft. Sometimes it's right. Sometimes it's 550 mph. Sometimes it's 600 mph. The model doesn't know the difference—they're all plausible outputs given the patterns it learned.

This is why temperature settings exist. We're literally controlling how much the model is allowed to dream.

What people keep getting wrong

"Just fine-tune it on your data." Fine-tuning teaches style and patterns, not facts. Fine-tune on your company's documentation and the model will sound like your documentation while making up equally convincing nonsense.

"Use retrieval-augmented generation." RAG helps, but now you're not using the LLM as a database—you're using an actual database and having the LLM format the results. Which is what you should have done from the start.

"Prompt engineering will fix it." You can prompt engineer better hallucinations, not eliminate them. "Be factual" is like telling a method actor to "be yourself"—they'll give you their interpretation of factual.

"Larger models are more accurate." Larger models are more convincing when they're wrong. GPT-4 hallucinates less than GPT-3, but when it does, it's much harder to detect because the hallucinations are more plausible.

What LLMs are actually good at

LLMs excel when exactness doesn't matter:

Transforming formats (JSON to SQL, markdown to HTML)
Generating examples and templates
Explaining concepts in different ways
Finding patterns and similarities
Creative tasks where "wrong" isn't really wrong

They're terrible when exactness is critical:

Specific numbers, dates, or quantities
Legal or medical facts
Real-time information
Anything requiring citation
Mathematical computation (yes, even GPT-4)

The path forward

Stop trying to make LLMs into databases. Instead:

Use real databases for facts. Store your ground truth somewhere deterministic. Use LLMs to query and format, not to store.

Embrace the probabilistic nature. LLMs are great for "what might this be?" and terrible for "what exactly is this?"

Build verification layers. If an LLM generates a fact, verify it against a source of truth. If it generates code, run tests. If it generates SQL, validate the syntax.

Design for uncertainty. Show confidence scores. Provide multiple options. Make it easy for users to verify and correct.

The mental model that works

Think of LLMs like you'd think of a brilliant intern who read everything on the internet but didn't take notes. They can:

Give you remarkably good first drafts
Point you in the right direction
Explain complex topics simply
Generate creative solutions

But you wouldn't trust them to:

Quote exact figures from memory
Recall specific dates without checking
Provide legal advice without supervision
Be your only source of truth

The future isn't retrieval

The sooner we stop trying to turn LLMs into databases, the sooner we can build products that actually work. LLMs are reasoning engines, not storage systems. They're pattern matchers, not fact retrievers. They're creative partners, not source of truth.

Use them for what they're good at. Use databases for what databases are good at. Stop being surprised when your dream machine dreams things that aren't real.

That's not a bug. That's the whole point.