LettuceDetect¶
A lightweight hallucination detection framework for RAG applications.
LettuceDetect is an encoder-based model built on ModernBERT that detects unsupported spans in LLM-generated answers by comparing them against provided context. It provides token-level precision for identifying exactly which parts of an answer are hallucinated.
Highlights¶
- Token-level precision — identifies exact hallucinated spans, not just "this answer has a problem"
- Fast inference — 30-60 samples/sec on A100, suitable for production
- Long context — supports up to 4K tokens (ModernBERT) or 8K tokens (EuroBERT)
- Multilingual — English, German, French, Spanish, Italian, Polish, Chinese, Hungarian
- Open source — MIT license, models on HuggingFace
Quick Example¶
from lettucedetect.models.inference import HallucinationDetector
detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedetect-base-modernbert-en-v1"
)
contexts = ["The capital of France is Paris. The population is 67 million."]
question = "What is the capital and population of France?"
answer = "The capital of France is Paris. The population is 69 million."
predictions = detector.predict(
context=contexts, question=question, answer=answer, output_format="spans"
)
# [{'start': 31, 'end': 71, 'confidence': 0.99, 'text': ' The population of France is 69 million.'}]
Performance¶
| Model | Example F1 | vs GPT-4 | vs Luna | Parameters |
|---|---|---|---|---|
| lettucedetect-base-v1 | 76.8% | +13.4% | +11.4% | 149M |
| lettucedetect-large-v1 | 79.2% | +15.8% | +13.8% | 395M |
Evaluated on RAGTruth test set. Surpasses GPT-4, Luna, and fine-tuned Llama-2-13B.
What's New¶
- Code Hallucination Dataset — A pipeline for generating span-level code hallucination data from SWE-bench (~18k samples across 53 repos)
- Multilingual models — EuroBERT-based models for 8 languages
- Web API — FastAPI server with async client support