Skip to content

Data Generation Pipeline

LettuceDetect builds hallucination-detection datasets from many grounded sources (code, tool output, markdown documents, paper chunks) using one shared set of composable primitives. Every source maps into the same unified taxonomy and the same HallucinationSample schema, so a single detector can be trained across modalities.

The primitives live in lettucedetect/generation/.

Generation vs. injection

Generating a hallucinated answer and injecting a hallucination into a correct answer are different operations:

Generator Injector
Module lettucedetect.models.generation.HallucinationGenerator (RAGFactChecker) lettucedetect.generation.injection
Operation synthesizes a hallucinated answer (can work from context alone) corrupts a known-correct answer into a hallucinated one
Spans recovered by diff (approximate) exact, by construction
Use the TinyLettuce synthetic-data recipe the multi-source dataset collection

The dataset collection uses the injector, because exact character-level spans are the basis of token-level detection.

The primitives

questions.py — derive a question from raw data

For sources that are just documents (markdown READMEs, wiki pages), generate a realistic user/developer question the document can answer.

answers.py — generate a correct, grounded answer

generate_grounded_answer(question, evidence) produces an answer that is correct by construction — grounded strictly in the supplied evidence. This is the step that lets the injector then corrupt a known-good answer. Sync and async variants are provided (generate_grounded_answer_async for batched throughput).

injection.py — inject a taxonomy hallucination with exact spans

inject_taxonomy(context, clean_answer, category, subcategory, modality) requests small localized replacement edits from the model, applies them deterministically, and validates the result. It is:

  • Universal — modality-aware (code, tool_output, markdown, prose); the taxonomy is the same across all of them.
  • Subtype-driven — injects a specific (category, subcategory) from the unified taxonomy (e.g. contradiction/numerical, fabricated_reference/identifier).
  • Span-exact — labels are derived from the applied edits, not from diffing.
  • Validated — coverage caps, minimum span length, leakage detection, and (for mixed prose+code answers) in-fence enforcement.

It has two modes. Targeted (inject_taxonomy) forces one chosen (category, subcategory) — used for code and tool output. Menu (inject_menu) hands the model a source-specific prompt that lists several hallucination types and lets it pick the 1–3 that fit the passage, labelling each edit with its own type (mapped to the taxonomy per source) — used for academic papers and other markdown, where a forced subtype often does not fit.

Sync and async variants exist for both (inject_taxonomy, inject_menu, and their _async twins).

classify.py — label an existing, untyped span

classify_span(context, answer, span_text) does the inverse of injection: given a span an annotator already marked as unsupported, it assigns a unified (category, subcategory) with an LLM. It is for sources whose spans ship without a native type, so the mechanical map_label cannot be used — most notably PsiloQA, whose hallucinations are natural (produced by real LLMs, not injected). It never edits text or invents spans; it only classifies, so supported is not a valid output. Sync and async variants are provided.

runner.py — batched, resumable orchestration

run_batched runs a per-item async processor over the work set with:

  • Async batching (asyncio.gather over a batch size) for local vLLM throughput.
  • Resumability — already-completed keys are skipped on restart; output is appended and flushed per batch, so a crash never loses finished work.
  • Failure logging — each rejected item is written to a failures file with its reason; re-running retries anything not yet in the output.

Composing a source adapter

Each source is a thin adapter that wires together only the primitives it needs. The five built sources:

Source (dataset) modality question answer injection prompt
lettucedetect-code (SWE-bench) code — (issue) format-builder over the patch code (targeted)
lettucedetect-tool-output (squeez) tool_output — (given) grounded tool-output (targeted)
lettucedetect-acl (acl-verbatim) markdown — (given) grounded paper (menu)
lettucedetect-readme (GitHub) markdown generated grounded generic factual (menu)
lettucedetect-wikipedia (open-wikipedia) markdown generated grounded generic factual (menu)

Document sources (README, Wikipedia) share doc_source.py — chunk by heading, generate a question per chunk, answer, inject — and differ only in corpus and question-type subset. ACL groups retrieved chunks per question and uses the paper-specific prompt. Each adapter supplies (context, clean_answer, modality) and a category/subtype distribution; the shared modules handle batching, resumability, and failure logging.

Public prose sources (separate collection)

Two existing public RAG datasets are folded in without any generation, as a separate prose collection, to complement the synthetic structured-context data:

Source (dataset) modality spans taxonomy assignment
ragtruth (RAGTruth) prose native typed mechanical map_label via apply_taxonomy.py --source ragtruth
psiloqa (PsiloQA) prose untyped, natural LLM classify_span via scripts/classify_psiloqa_spans.py

RAGTruth's native labels map deterministically. PsiloQA's hallucinations are produced by real LLMs (not injected) and annotated only as binary char spans, so each span is labelled by the classify.py primitive. All 14 PsiloQA languages and its original train/validation/test splits are preserved. These provide a naturally-occurring counterpart to the injected spans — useful for checking that detectors generalize beyond the corruption process.