Hallucination Detection in Production Agents

Hallucinations are not bugs — they are features of probabilistic systems. Here is how we validate agent outputs at scale using LLM-as-a-Judge pipelines.

Understanding Hallucinations

Hallucinations in AI agents fall into three categories:

Factual hallucinations — stating incorrect facts with confidence
Fabricated references — citing sources that don't exist
Logical hallucinations — drawing conclusions that don't follow from the premises

Each requires a different detection strategy.

The LLM-as-a-Judge Pipeline

Architecture

Agent Output → Validation Prompt → Judge LLM → Score/Flag → Decision

How It Works

The agent produces an output
The output is sent to a separate "judge" LLM with a validation prompt
The judge evaluates the output against the source context
A confidence score is assigned
Outputs below the threshold are flagged for human review

Key Design Decisions

Use a different model for the judge than the agent — correlated errors are the enemy
Provide source context to the judge — it needs ground truth to evaluate against
Design specific rubrics for different output types (factual claims, recommendations, summaries)

Detection Strategies by Type

Factual Hallucinations

Cross-reference agent claims against knowledge base
Use entity extraction + fact verification pipelines
Track confidence scores across similar queries

Fabricated References

Validate all cited URLs, papers, and documents
Maintain an allowlist of verified sources
Flag any reference not in the knowledge base

Logical Hallucinations

Chain-of-thought validation — verify each reasoning step
Consistency checks — same inputs should produce compatible outputs
Adversarial probing — rephrase the same question and compare answers

Production Monitoring

Metrics to Track

Hallucination rate — percentage of outputs flagged by the judge
False positive rate — judge flags that human reviewers override
Detection latency — time added by the validation pipeline
Category distribution — which types of hallucinations are most common

Alerting

Set thresholds for hallucination rate increases
Alert on new categories of hallucinations
Track trends over time — increasing rates may indicate model drift

The Cost-Benefit Calculation

Running a judge LLM doubles your inference costs. But the cost of a hallucinated medical recommendation, financial advice, or legal guidance is orders of magnitude higher. For production agents handling sensitive domains, hallucination detection is not optional.