Unified Hallucination Taxonomy¶

A single taxonomy that every data source maps into, so that prose (RAGTruth, FAVA), code (SWE-bench), and markdown hallucinations all share one label space. This is what lets a single detector be trained across modalities and lets users redefine the label set at inference time.

Canonical implementation: lettucedetect/datasets/taxonomy.py. Applied to data via lettucedetect/preprocess/apply_taxonomy.py.

Why a unified taxonomy¶

Every prior taxonomy cuts the same conceptual space slightly differently — FAVA (6 types), RAGTruth (4 types), our prose generator (6 types), our code pipeline (3 types), HalluVerse25 (3 levels). None of them unify in a usable way. Training a cross-modality detector requires one label space that all of these map into without regenerating any data.

The taxonomy is built on two orthogonal axes:

Axis 1 — relationship to context. Does the span conflict with, add beyond, or fabricate a reference into the context? This becomes the top-level category.
Axis 2 — surface element affected. What kind of thing is wrong — a number, a date, a name, an identifier? This becomes the (open-set, user-extensible) subcategory.

Top-level categories¶

Mutually exclusive per span.

Category	Definition
`supported`	Span is entailed by the context. The non-hallucinated default.
`contradiction`	Span asserts X; context asserts Y; Y ≠ X. A direct, locally checkable conflict.
`unsupported_addition`	Span asserts X; context neither states X nor anything contradicting it. Plausible but not derivable.
`fabricated_reference`	Span references a named structural element (entity, section, function, identifier, table, equation) that does not appear in the context.

omission (a span that is technically correct but materially incomplete) is treated as a document-level binary flag, not a span class — it cannot be localized to a span of text that is present.

Subcategories¶

Optional attributes of an already-classified span. Open-set: callers may extend them for a vertical (legal, medical, finance) without retraining.

Category	Subcategories
`contradiction`	`numerical`, `temporal`, `entity`, `relational`, `value`
`unsupported_addition`	`claim`, `elaboration`, `subjective`, `behavior`
`fabricated_reference`	`entity`, `section`, `identifier`, `attribute`

Source-label mapping¶

Every native label from every source maps mechanically into (category, subcategory). Nothing has to be regenerated; the synthetic and code data map deterministically, and RAGTruth uses a light context-aware heuristic.

Code (SWE-bench-derived)¶

Native label	→ Category	→ Subcategory
`structural` (fabricated function/identifier name)	`fabricated_reference`	`identifier`
`behavioral` (wrong arg/value/logic)	`contradiction`	`value`
`semantic` (solves the wrong problem)	`unsupported_addition`	`behavior`

The original native label is preserved in each span's label field for backwards compatibility; category/subcategory are added alongside it.

RAGTruth (prose)¶

Native label	→ Category	→ Subcategory
Evident Conflict	`contradiction`	—
Subtle Conflict	`contradiction`	—
Evident Baseless Info	`unsupported_addition`	`claim`
Subtle Baseless Info	`unsupported_addition`	`elaboration`

RAGTruth uses a context-aware refinement: Baseless Info whose span contains a proper noun absent from the context is reclassified as fabricated_reference / entity (see ragtruth_map_with_context).

FAVA (prose)¶

Native label	→ Category	→ Subcategory
Entity	`contradiction`	`entity`
Relation	`contradiction`	`relational`
Contradictory	`contradiction`	—
Invented	`fabricated_reference`	`entity`
Subjective	`unsupported_addition`	`subjective`
Unverifiable	`unsupported_addition`	`claim`

LD prose generator (`rag-fact-checker`)¶

Native label	→ Category	→ Subcategory
FACTUAL	`contradiction`	`entity`
TEMPORAL	`contradiction`	`temporal`
NUMERICAL	`contradiction`	`numerical`
RELATIONAL	`contradiction`	`relational`
CONTEXTUAL	`unsupported_addition`	`claim`
OMISSION	`omission`	(document-level)
FABRICATED_ENTITY	`fabricated_reference`	`entity`
SUBJECTIVE	`unsupported_addition`	`subjective`
UNVERIFIABLE	`unsupported_addition`	`claim`

Markdown (planned)¶

Native label	→ Category	→ Subcategory
contradicted_number	`contradiction`	`numerical`
contradicted_date	`contradiction`	`temporal`
contradicted_entity	`contradiction`	`entity`
contradicted_table_cell	`contradiction`	`value`
extra_claim	`unsupported_addition`	`claim`
fabricated_section_ref	`fabricated_reference`	`section`
fabricated_citation	`fabricated_reference`	`entity`
fabricated_equation_ref	`fabricated_reference`	`section`

Applying the taxonomy¶

apply_taxonomy.py enriches an already-preprocessed dataset with category, subcategory, context_modality, and provenance metadata, writing one JSONL file per split. The sample-level category is a majority vote over its spans' categories.

python -m lettucedetect.preprocess.apply_taxonomy \
    --source code \
    --data_path data/code_hallucination/code_hallucination_data.json \
    --metadata_path data/code_hallucination/code_hallucination_metadata.json \
    --output_dir data/v2/code_hallucination

Each output sample carries both the native label (on the span) and the unified category/subcategory (on both the span and the sample):

{
  "labels": [
    {"start": 18, "end": 25, "label": "structural",
     "category": "fabricated_reference", "subcategory": "identifier"}
  ],
  "context_modality": "code",
  "category": "fabricated_reference",
  "subcategory": "identifier",
  "metadata": {"instance_id": "...", "repo": "...", "format_type": "...",
               "is_hallucinated": true, "injector_model": "..."}
}

Why this matters¶

A single, source-agnostic label space is what lets one detector be trained across modalities (prose, code, markdown) instead of one model per source. It also keeps the door open to typed, span-level output — telling a user not just that a span is unsupported but how (contradiction vs. unsupported addition vs. fabricated reference) — which is the differentiator over scalar faithfulness scores. Every data source mapping in cleanly is the precondition for both.