Skip to content

Hallucination Detection

Hallucination detection in AIGuard combines three complementary techniques and resolves them into a single classification. You can enable any subset, or let AIGuard pick automatically based on the data available for each test case.

Modes

Ground-truth verification

When a test case carries an expected answer, the model's response is compared against it using a configurable similarity metric. This is the strongest signal — use it whenever you have curated reference data.

Contextual grounding

When a context passage is provided (typical in RAG applications), AIGuard checks whether the response stays within the supporting evidence. Claims that go beyond the context are flagged.

Self-consistency

When no ground truth and no context is available, the model is sampled multiple times and responses are scored for internal agreement. High variance suggests the model is guessing.

Result schema

Each evaluation returns a HallucinationResult:

json
{
  "test_case_id": "qa-payments-007",
  "mode": "ground_truth",
  "score": 0.12,
  "category": "factual_error",
  "passed": true,
  "details": {
    "ground_truth": "PSD2 came into effect on 13 January 2018.",
    "response":     "PSD2 came into effect on 13 January 2018.",
    "similarity":   0.97
  },
  "metadata": {
    "model": "gpt-4o",
    "latency_ms": 412
  }
}

Categories

Results are classified into a taxonomy of failure modes so you can see how the model is wrong, not just that it is wrong.

  • factual_error — contradicts known facts
  • unsupported — adds claims not in the provided context
  • inconsistent — contradicts itself across samples
  • fabricated_entity — invented names, dates, or citations
  • refusal — refused to answer when an answer existed

Run it

bash
# Quick mode (small subset)
aiguard evaluate hallucination --mode quick

# Thorough — full dataset, multiple samples per case
aiguard evaluate hallucination --mode thorough --output halluc.json