Installation
Full RAG System
For the complete pipeline with document processing, vector indexing, and extraction:
This installs all dependencies including torch, transformers, pymilvus, sentence-transformers, docling, and chonkie.
Lightweight Core
If you only need verbatim span extraction without the full RAG pipeline:
This installs only openai and pydantic -- no heavy ML dependencies.
The verbatim-core package provides the same verbatim_core module with:
VerbatimTransform-- question + context -> cited, grounded answerLLMSpanExtractor-- extract verbatim spans using an LLMLLMClient-- unified OpenAI API wrapper- Template system for response formatting
Model-Based Extraction
If you want to use the ModernBERT or Zilliz semantic highlight extractors without the full RAG pipeline:
This adds torch, transformers, and scikit-learn on top of the lean install.
Environment Setup
Set your OpenAI API key:
Note
The OpenAI API key is only required for LLM-based extraction. The ModernBERT-based extractor (available in verbatim-rag) does not require an API key.