Unified Hallucination Taxonomy¶
A single taxonomy that every data source maps into, so that prose (RAGTruth, FAVA), code (SWE-bench), and markdown hallucinations all share one label space. This is what lets a single detector be trained across modalities and lets users redefine the label set at inference time.
Canonical implementation: lettucedetect/datasets/taxonomy.py.
Applied to data via lettucedetect/preprocess/apply_taxonomy.py.
Why a unified taxonomy¶
Every prior taxonomy cuts the same conceptual space slightly differently — FAVA (6 types), RAGTruth (4 types), our prose generator (6 types), our code pipeline (3 types), HalluVerse25 (3 levels). None of them unify in a usable way. Training a cross-modality detector requires one label space that all of these map into without regenerating any data.
The taxonomy is built on two orthogonal axes:
- Axis 1 — relationship to context. Does the span conflict with, add beyond, or fabricate a reference into the context? This becomes the top-level category.
- Axis 2 — surface element affected. What kind of thing is wrong — a number, a date, a name, an identifier? This becomes the (open-set, user-extensible) subcategory.
Top-level categories¶
Mutually exclusive per span.
| Category | Definition |
|---|---|
supported |
Span is entailed by the context. The non-hallucinated default. |
contradiction |
Span asserts X; context asserts Y; Y ≠ X. A direct, locally checkable conflict. |
unsupported_addition |
Span asserts X; context neither states X nor anything contradicting it. Plausible but not derivable. |
fabricated_reference |
Span references a named structural element (entity, section, function, identifier, table, equation) that does not appear in the context. |
omission (a span that is technically correct but materially incomplete) is
treated as a document-level binary flag, not a span class — it cannot be
localized to a span of text that is present.
Subcategories¶
Optional attributes of an already-classified span. Open-set: callers may extend them for a vertical (legal, medical, finance) without retraining.
| Category | Subcategories |
|---|---|
contradiction |
numerical, temporal, entity, relational, value |
unsupported_addition |
claim, elaboration, subjective, behavior |
fabricated_reference |
entity, section, identifier, attribute |
Source-label mapping¶
Every native label from every source maps mechanically into (category,
subcategory). Nothing has to be regenerated; the synthetic and code data map
deterministically, and RAGTruth uses a light context-aware heuristic.
Code (SWE-bench-derived)¶
| Native label | → Category | → Subcategory |
|---|---|---|
structural (fabricated function/identifier name) |
fabricated_reference |
identifier |
behavioral (wrong arg/value/logic) |
contradiction |
value |
semantic (solves the wrong problem) |
unsupported_addition |
behavior |
The original native label is preserved in each span's label field for
backwards compatibility; category/subcategory are added alongside it.
RAGTruth (prose)¶
| Native label | → Category | → Subcategory |
|---|---|---|
| Evident Conflict | contradiction |
— |
| Subtle Conflict | contradiction |
— |
| Evident Baseless Info | unsupported_addition |
claim |
| Subtle Baseless Info | unsupported_addition |
elaboration |
RAGTruth uses a context-aware refinement: Baseless Info whose span contains a
proper noun absent from the context is reclassified as fabricated_reference /
entity (see ragtruth_map_with_context).
FAVA (prose)¶
| Native label | → Category | → Subcategory |
|---|---|---|
| Entity | contradiction |
entity |
| Relation | contradiction |
relational |
| Contradictory | contradiction |
— |
| Invented | fabricated_reference |
entity |
| Subjective | unsupported_addition |
subjective |
| Unverifiable | unsupported_addition |
claim |
LD prose generator (rag-fact-checker)¶
| Native label | → Category | → Subcategory |
|---|---|---|
| FACTUAL | contradiction |
entity |
| TEMPORAL | contradiction |
temporal |
| NUMERICAL | contradiction |
numerical |
| RELATIONAL | contradiction |
relational |
| CONTEXTUAL | unsupported_addition |
claim |
| OMISSION | omission |
(document-level) |
| FABRICATED_ENTITY | fabricated_reference |
entity |
| SUBJECTIVE | unsupported_addition |
subjective |
| UNVERIFIABLE | unsupported_addition |
claim |
Markdown (planned)¶
| Native label | → Category | → Subcategory |
|---|---|---|
| contradicted_number | contradiction |
numerical |
| contradicted_date | contradiction |
temporal |
| contradicted_entity | contradiction |
entity |
| contradicted_table_cell | contradiction |
value |
| extra_claim | unsupported_addition |
claim |
| fabricated_section_ref | fabricated_reference |
section |
| fabricated_citation | fabricated_reference |
entity |
| fabricated_equation_ref | fabricated_reference |
section |
Applying the taxonomy¶
apply_taxonomy.py enriches an already-preprocessed dataset with category,
subcategory, context_modality, and provenance metadata, writing one JSONL
file per split. The sample-level category is a majority vote over its spans'
categories.
python -m lettucedetect.preprocess.apply_taxonomy \
--source code \
--data_path data/code_hallucination/code_hallucination_data.json \
--metadata_path data/code_hallucination/code_hallucination_metadata.json \
--output_dir data/v2/code_hallucination
Each output sample carries both the native label (on the span) and the unified category/subcategory (on both the span and the sample):
{
"labels": [
{"start": 18, "end": 25, "label": "structural",
"category": "fabricated_reference", "subcategory": "identifier"}
],
"context_modality": "code",
"category": "fabricated_reference",
"subcategory": "identifier",
"metadata": {"instance_id": "...", "repo": "...", "format_type": "...",
"is_hallucinated": true, "injector_model": "..."}
}
Why this matters¶
A single, source-agnostic label space is what lets one detector be trained across modalities (prose, code, markdown) instead of one model per source. It also keeps the door open to typed, span-level output — telling a user not just that a span is unsupported but how (contradiction vs. unsupported addition vs. fabricated reference) — which is the differentiator over scalar faithfulness scores. Every data source mapping in cleanly is the precondition for both.