Hallucination Detection in Production Agents
Hallucinations are not bugs — they are features of probabilistic systems. Here is how we validate agent outputs at scale using LLM-as-a-Judge pipelines.
Understanding Hallucinations
Hallucinations in AI agents fall into three categories:
- Factual hallucinations — stating incorrect facts with confidence
- Fabricated references — citing sources that don't exist
- Logical hallucinations — drawing conclusions that don't follow from the premises
Each requires a different detection strategy.
The LLM-as-a-Judge Pipeline
Architecture
Agent Output → Validation Prompt → Judge LLM → Score/Flag → Decision
How It Works
- The agent produces an output
- The output is sent to a separate "judge" LLM with a validation prompt
- The judge evaluates the output against the source context
- A confidence score is assigned
- Outputs below the threshold are flagged for human review
Key Design Decisions
- Use a different model for the judge than the agent — correlated errors are the enemy
- Provide source context to the judge — it needs ground truth to evaluate against
- Design specific rubrics for different output types (factual claims, recommendations, summaries)
Detection Strategies by Type
Factual Hallucinations
- Cross-reference agent claims against knowledge base
- Use entity extraction + fact verification pipelines
- Track confidence scores across similar queries
Fabricated References
- Validate all cited URLs, papers, and documents
- Maintain an allowlist of verified sources
- Flag any reference not in the knowledge base
Logical Hallucinations
- Chain-of-thought validation — verify each reasoning step
- Consistency checks — same inputs should produce compatible outputs
- Adversarial probing — rephrase the same question and compare answers
Production Monitoring
Metrics to Track
- Hallucination rate — percentage of outputs flagged by the judge
- False positive rate — judge flags that human reviewers override
- Detection latency — time added by the validation pipeline
- Category distribution — which types of hallucinations are most common
Alerting
- Set thresholds for hallucination rate increases
- Alert on new categories of hallucinations
- Track trends over time — increasing rates may indicate model drift
The Cost-Benefit Calculation
Running a judge LLM doubles your inference costs. But the cost of a hallucinated medical recommendation, financial advice, or legal guidance is orders of magnitude higher. For production agents handling sensitive domains, hallucination detection is not optional.