Skip to content

LettuceDetect

LettuceDetect Logo

A lightweight hallucination detection framework for RAG applications.

LettuceDetect is an encoder-based model built on ModernBERT that detects unsupported spans in LLM-generated answers by comparing them against provided context. It provides token-level precision for identifying exactly which parts of an answer are hallucinated.

Highlights

  • Token-level precision — identifies exact hallucinated spans, not just "this answer has a problem"
  • Fast inference — 30-60 samples/sec on A100, suitable for production
  • Long context — supports up to 4K tokens (ModernBERT) or 8K tokens (EuroBERT)
  • Multilingual — English, German, French, Spanish, Italian, Polish, Chinese, Hungarian
  • Open source — MIT license, models on HuggingFace

Quick Example

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/lettucedetect-base-modernbert-en-v1"
)

contexts = ["The capital of France is Paris. The population is 67 million."]
question = "What is the capital and population of France?"
answer = "The capital of France is Paris. The population is 69 million."

predictions = detector.predict(
    context=contexts, question=question, answer=answer, output_format="spans"
)
# [{'start': 31, 'end': 71, 'confidence': 0.99, 'text': ' The population of France is 69 million.'}]

Performance

Model Example F1 vs GPT-4 vs Luna Parameters
lettucedetect-base-v1 76.8% +13.4% +11.4% 149M
lettucedetect-large-v1 79.2% +15.8% +13.8% 395M

Evaluated on RAGTruth test set. Surpasses GPT-4, Luna, and fine-tuned Llama-2-13B.

What's New

  • Code Hallucination Dataset — A pipeline for generating span-level code hallucination data from SWE-bench (~18k samples across 53 repos)
  • Multilingual models — EuroBERT-based models for 8 languages
  • Web API — FastAPI server with async client support