← Back to Insights

Hallucination Detection in Production Agents

Nov 20, 20258 min read

Hallucination Detection in Production Agents

Hallucinations are not bugs — they are features of probabilistic systems. Here is how we validate agent outputs at scale using LLM-as-a-Judge pipelines.

Understanding Hallucinations

Hallucinations in AI agents fall into three categories:

  1. Factual hallucinations — stating incorrect facts with confidence
  2. Fabricated references — citing sources that don't exist
  3. Logical hallucinations — drawing conclusions that don't follow from the premises

Each requires a different detection strategy.

The LLM-as-a-Judge Pipeline

Architecture

Agent Output → Validation Prompt → Judge LLM → Score/Flag → Decision

How It Works

  1. The agent produces an output
  2. The output is sent to a separate "judge" LLM with a validation prompt
  3. The judge evaluates the output against the source context
  4. A confidence score is assigned
  5. Outputs below the threshold are flagged for human review

Key Design Decisions

  • Use a different model for the judge than the agent — correlated errors are the enemy
  • Provide source context to the judge — it needs ground truth to evaluate against
  • Design specific rubrics for different output types (factual claims, recommendations, summaries)

Detection Strategies by Type

Factual Hallucinations

  • Cross-reference agent claims against knowledge base
  • Use entity extraction + fact verification pipelines
  • Track confidence scores across similar queries

Fabricated References

  • Validate all cited URLs, papers, and documents
  • Maintain an allowlist of verified sources
  • Flag any reference not in the knowledge base

Logical Hallucinations

  • Chain-of-thought validation — verify each reasoning step
  • Consistency checks — same inputs should produce compatible outputs
  • Adversarial probing — rephrase the same question and compare answers

Production Monitoring

Metrics to Track

  • Hallucination rate — percentage of outputs flagged by the judge
  • False positive rate — judge flags that human reviewers override
  • Detection latency — time added by the validation pipeline
  • Category distribution — which types of hallucinations are most common

Alerting

  • Set thresholds for hallucination rate increases
  • Alert on new categories of hallucinations
  • Track trends over time — increasing rates may indicate model drift

The Cost-Benefit Calculation

Running a judge LLM doubles your inference costs. But the cost of a hallucinated medical recommendation, financial advice, or legal guidance is orders of magnitude higher. For production agents handling sensitive domains, hallucination detection is not optional.

Related Articles

Digixr Agent

Powered by our own Context Engineering