How RuleChef Works¶
RuleChef turns labeled examples into fast, deterministic rules. The key idea: an LLM writes the rules, but the rules run without any LLM.
flowchart LR
A[Your Examples] -->|Learning time| B[LLM writes rules]
B --> C[Regex / Code / spaCy rules]
C -->|Inference time| D["Fast local execution\n< 1ms, no API calls"]
This page explains the core architecture. For API details, see the Quick Start.
The Pipeline¶
When you call chef.learn_rules(), here's what happens:
flowchart TD
A["1. Buffer\n(collected examples)"] --> B["2. Commit\n(buffer → dataset)"]
B --> C["3. Synthesis\n(LLM generates rules)"]
C --> D["4. Evaluation\n(test rules vs examples)"]
D -->|Failures exist| E["5. Refinement\n(LLM patches rules)"]
E --> D
D -->|Good enough| F["6. Persist\n(save rules to disk)"]
Each step in detail:
1. Buffer¶
When you call add_example() or add_correction(), data goes into a buffer — not directly into the dataset. This lets you collect many examples before learning, and gives the coordinator a chance to decide when and how to learn.
chef.add_example(input1, output1) # → buffer
chef.add_example(input2, output2) # → buffer
chef.add_correction(input3, wrong, correct) # → buffer (high priority)
# Nothing has been learned yet — examples are waiting in the buffer
2. Commit¶
When learn_rules() is called, buffered examples are committed to the dataset. The dataset is the permanent store — it persists to disk and accumulates across learning rounds.
3. Synthesis¶
The LLM receives a prompt containing:
- Task description — what you're trying to do
- Training examples — input/output pairs from the dataset
- Data evidence — regex pattern hints from grex (if enabled)
- User feedback — any guidance you've added
- Format instructions — how to write regex/code/spaCy rules
The LLM returns a set of rules in JSON format. Each rule has a pattern (regex, code function, or spaCy matcher) and an output template.
For multi-class tasks (NER, classification), synthesis can run per-class — one LLM call per class, each with positive examples (capped to max_samples) and counter-examples from other classes. See Prompt Size Controls for details.
4. Evaluation¶
Rules are tested against the dataset. RuleChef computes:
- Per-example: did the rules produce the correct output?
- Per-class: precision, recall, F1 for each label
- Overall: micro/macro F1, exact match accuracy
Any examples where rules fail are collected as failures.
5. Refinement¶
If failures exist and iterations remain, RuleChef sends the LLM a patch prompt containing:
- Current rules (what's already working)
- Specific failures (what's broken, with expected vs actual output)
- Coordinator guidance (which classes to focus on)
The LLM generates patch rules targeted at the failures. These are merged into the existing ruleset — stable rules are preserved.
6. Persist¶
The final rules are saved to disk as JSON. When you create a new RuleChef with the same dataset_name and storage_path, rules load automatically.
What Are Rules?¶
A rule is a pattern that maps input text to structured output. There are three formats:
Regex Rules¶
A regex pattern with an output template. When the pattern matches, the template fills in the output:
Pattern: (?i)\b(\d+\s*mg)\b
Template: {"text": "$0", "type": "DOSAGE", "start": "$start", "end": "$end"}
$0 is the full match, $1/$2 are capture groups, $start/$end are character offsets. The rule engine computes positions — the LLM never predicts offsets.
Code Rules¶
A Python function that takes input data and returns structured output:
def extract(input_data):
import re
text = input_data["text"]
spans = []
for m in re.finditer(r'\b[A-Z][a-z]+ine\b', text):
spans.append({"text": m.group(), "start": m.start(), "end": m.end(), "type": "DRUG"})
return spans
Code rules run in a restricted sandbox — no imports, no file access, no network calls.
spaCy Rules¶
Token or dependency matcher patterns using linguistic attributes (POS tags, lemmas, dependency relations). Requires pip install rulechef[spacy].
Schemas¶
Task Definition¶
A Task tells RuleChef what you're trying to do:
task = Task(
name="Medical NER",
description="Extract drugs, dosages, and conditions",
input_schema={"text": "str"},
output_schema={"entities": "List[{text: str, start: int, end: int, type: DRUG|DOSAGE|CONDITION}]"},
type=TaskType.NER,
text_field="text",
)
input_schema— tells the LLM what input fields exist (so it writesinput_data["text"]in code rules)output_schema— tells the LLM what output structure to produce (entity fields, label names)type— determines evaluation logic and prompt templatestext_field— which input field regex/spaCy rules match against
Schemas are documentation for the LLM, not executed code. You can also use Pydantic models for type-safe validation.
Coordinators¶
A coordinator decides when to learn and how to guide refinement:
| Coordinator | How It Decides |
|---|---|
SimpleCoordinator |
Threshold-based: learn after N examples, refine after M corrections |
AgenticCoordinator |
LLM-guided: analyzes per-class metrics, focuses on weak classes, stops when performance plateaus |
The agentic coordinator also supports rule pruning — merging redundant rules and removing noise after learning.
See Coordinators for details.
The LLM's Role¶
The LLM is used only during learning — never during extraction:
| Operation | LLM Used? | When |
|---|---|---|
add_example() |
No | Just stores in buffer |
learn_rules() |
Yes | Synthesis + refinement prompts |
extract() |
No | Runs rules locally |
evaluate() |
No | Runs rules locally |
After learning, the rules are self-contained. You can serialize them, ship them to a different machine, and run them without any LLM access.
Next Steps¶
- Quick Start — code examples for all task types
- Learning & Refinement — buffer architecture, sampling strategies, incremental patching
- Coordinators — simple vs agentic, rule pruning
- Evaluation & Feedback — metrics, corrections, custom matchers