Advanced Features¶

Observation Mode¶

RuleChef can learn from your existing LLM pipeline. Collect observations from any LLM provider — no task definition needed upfront.

Structured observations (`add_observation`)¶

When you know the input/output shape, pass structured data directly:

from rulechef import RuleChef

chef = RuleChef(client=client, model="gpt-4o-mini")  # No task needed

# Works with any LLM — Anthropic, Groq, local models, etc.
response = anthropic_client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[{"role": "user", "content": f"Classify: {query}"}],
)
chef.add_observation(
    {"text": query},
    {"label": response.content[0].text.strip()},
)

# After collecting enough observations, learn rules
chef.learn_rules()

Raw observations (`add_raw_observation`)¶

When you don't know the schema, pass raw messages and let RuleChef discover it:

chef = RuleChef(client=client, model="gpt-4o-mini")

# Capture the raw interaction — RuleChef figures out the schema later
for query in queries:
    response = any_llm_call(query)
    chef.add_raw_observation(
        messages=[{"role": "user", "content": query}],
        response=response,
    )

# Discovers task schema + maps observations + learns rules
chef.learn_rules()
print(chef.task.to_dict())  # See what was discovered

Auto-capture for OpenAI clients (`start_observing`)¶

For OpenAI-compatible clients, monkey-patch to capture calls automatically:

wrapped = chef.start_observing(openai_client, auto_learn=False)

# Use wrapped as normal — every call is captured
response = wrapped.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": query}],
)

chef.learn_rules()   # Discovers + maps + learns
chef.stop_observing()

When auto_learn=True, learning triggers automatically based on the coordinator's decision. Streaming calls (stream=True) are also observed — RuleChef wraps the stream to capture content after it completes.

GLiNER / GLiNER2 observation (`start_observing_gliner`)¶

Observe predictions from GLiNER (NER) or GLiNER2 (NER, classification, structured extraction) models:

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_multi-v2.1")

chef = RuleChef(client=client, model="gpt-4o-mini")
chef.start_observing_gliner(model, auto_learn=False)

# Use the model as normal — predictions are captured
entities = model.predict_entities("Apple was founded by Steve Jobs.", ["company", "person"])

chef.learn_rules()
chef.stop_observing_gliner()

For GLiNER2, specify which method to observe:

from gliner2 import GLiNER2

model = GLiNER2.from_pretrained("fastino/gliner2")

# NER
chef.start_observing_gliner(model, method="extract_entities", auto_learn=False)

# Classification
chef.start_observing_gliner(model, method="classify_text", auto_learn=False)

# Structured extraction
chef.start_observing_gliner(model, method="extract_json", auto_learn=False)

No LLM calls are needed for task discovery — GLiNER output is already structured. The task type, schema, and labels are inferred automatically from the observed predictions.

Training Data Logger (Distillation)¶

RuleChef can capture every LLM call made during rule synthesis as structured training data, suitable for fine-tuning a smaller model to replace the LLM. The logger is fully optional — pass a TrainingDataLogger instance and all calls (synthesis, patching, coordination, auditing) are written to a JSONL file.

from rulechef import RuleChef, TrainingDataLogger

logger = TrainingDataLogger(
    "training_data/run_001.jsonl",
    run_metadata={"model": "kimi-k2", "dataset": "banking77"},
)
chef = RuleChef(task, client, training_logger=logger)

chef.add_example(...)
chef.learn_rules()

print(logger.stats)   # {"rule_synthesis": 5, "rule_patch": 3, "guide_refinement": 10, ...}
print(logger.count)    # 18 total entries

Output format¶

Each line in the JSONL file is a self-contained training example:

{
  "messages": [
    {"role": "user", "content": "...prompt..."},
    {"role": "assistant", "content": "...response..."}
  ],
  "call_type": "rule_synthesis",
  "metadata": {
    "model": "kimi-k2",
    "dataset": "banking77",
    "task_name": "Intent Classification",
    "dataset_size": 25,
    "num_rules_in_response": 8,
    "response_valid": true
  },
  "timestamp": "2026-02-19T14:30:00+00:00"
}

For fine-tuning, use only the messages field. The metadata and call_type are for filtering — e.g. keep only entries where response_valid is true, or only runs where the final F1 exceeded a threshold.

Call types¶

Call type	Source	Description
`rule_synthesis`	Learner	Bulk rule generation from examples
`rule_synthesis_per_class`	Learner	Per-class rule generation
`rule_patch`	Learner	Patch rules targeted at failures
`synthetic_generation`	Learner	Synthetic example generation
`guide_refinement`	Coordinator	Per-iteration refinement guidance
`audit_rules`	Coordinator	Rule pruning/merging audit
`trigger_decision`	Coordinator	Should-learn decision

Generating training data at scale¶

To generate a diverse training corpus, run RuleChef across multiple datasets with varied configurations:

import itertools
from rulechef import RuleChef, TrainingDataLogger, AgenticCoordinator

datasets = ["banking77", "clinc150", "snips", ...]
shots = [3, 5, 10]

for ds_name, n_shots in itertools.product(datasets, shots):
    logger = TrainingDataLogger(
        f"training_data/{ds_name}_{n_shots}shot.jsonl",
        run_metadata={"dataset": ds_name, "shots": n_shots},
    )
    coordinator = AgenticCoordinator(client, training_logger=logger)
    chef = RuleChef(task, client, coordinator=coordinator, training_logger=logger)
    # ... add examples, learn rules ...

Tip

The logger appends to the file, so multiple runs can safely write to the same path. Each entry carries its own run_metadata and timestamp.

Pydantic Output Schemas¶

Use Pydantic models for type-safe, validated outputs:

from pydantic import BaseModel
from typing import List, Literal

class Entity(BaseModel):
    text: str
    start: int
    end: int
    type: Literal["PERSON", "ORG", "LOCATION"]

class Output(BaseModel):
    entities: List[Entity]

task = Task(
    name="NER",
    description="Extract entities",
    input_schema={"text": "str"},
    output_schema=Output,
    type=TaskType.NER,
)

RuleChef automatically:

Discovers valid labels from Literal type annotations
Validates rule outputs against the model at runtime
Generates readable schema fragments for synthesis prompts

Output Templates¶

Rules can emit structured JSON using template variables:

Regex Templates¶

Variable	Meaning
`$0`	Full match text
`$1`, `$2`, ...	Capture groups
`$start`, `$end`	Match offsets

{
  "output_template": {
    "text": "$1",
    "type": "DRUG",
    "start": "$start",
    "end": "$end"
  }
}

spaCy Templates¶

Variable	Meaning
`$1.text`, `$2.text`	Token text
`$1.start`, `$1.end`	Token character offsets

spaCy Patterns¶

Token Matcher¶

Use token attributes for linguistic patterns:

[
  {"POS": "PROPN", "OP": "+"},
  {"POS": "NOUN"}
]

Available attributes: TEXT, LOWER, POS, TAG, DEP, LEMMA, SHAPE, IS_ALPHA, IS_DIGIT, OP.

Dependency Matcher¶

Match syntactic structure:

[
  {"RIGHT_ID": "verb", "RIGHT_ATTRS": {"POS": "VERB"}},
  {"LEFT_ID": "verb", "REL_OP": ">", "RIGHT_ID": "subj", "RIGHT_ATTRS": {"DEP": "nsubj"}}
]

spaCy NER

By default, use_spacy_ner=False — spaCy's NER pipe is disabled and patterns relying on ENT_TYPE are rejected. Set use_spacy_ner=True to enable.

LLM Fallback¶

When rules produce no results, optionally fall back to direct LLM extraction:

chef = RuleChef(task, client, llm_fallback=True)

result = chef.extract({"text": "unusual input"})
# If no rule matches → calls LLM directly

Using grex for Regex Suggestions¶

grex is a library that infers regex patterns from example strings. RuleChef uses it to give the LLM concrete pattern suggestions during rule synthesis.

Install¶

grex is an optional dependency:

pip install rulechef[grex]

It's enabled by default when installed. To disable:

chef = RuleChef(task, client, use_grex=False)

What grex does¶

You give grex a list of strings, it gives you a regex that matches all of them:

from grex import RegExpBuilder

dates = ["2024-01-15", "2024-02-28", "2023-12-01"]

# Exact: alternation of all inputs
RegExpBuilder.from_test_cases(dates).without_anchors().build()
# → '(2023\-12\-01|2024\-01\-15|2024\-02\-28)'

# Generalized: replaces digits/repetitions with character classes
RegExpBuilder.from_test_cases(dates).without_anchors() \
    .with_conversion_of_digits().with_conversion_of_repetitions().build()
# → '\d{4}\-\d{2}\-\d{2}'

The generalized pattern \d{4}\-\d{2}\-\d{2} matches any date in that format, not just the three examples. This is what makes grex valuable — it finds structure.

How RuleChef uses it¶

During rule synthesis, RuleChef builds a "data evidence" section in the prompt. Without grex, the LLM only sees raw example strings:

DATA EVIDENCE FROM TRAINING:
- exchange_rate (3 examples): "what is the exchange rate for USD to EUR?", "how much is a dollar in euros?", "I want to know the current rates"
- card_arrival (3 examples): "my new card still hasn't arrived", "when will my new card be delivered?", "my card hasn't come in the mail yet"

With grex enabled, each group gets regex pattern suggestions appended:

DATA EVIDENCE FROM TRAINING:
- exchange_rate (3 examples): "what is the exchange rate for USD to EUR?", ...
  Exact pattern: (I want to know the current rates|how much is a dollar in euros\?|what is the exchange rate for USD to EUR\?)
- card_arrival (3 examples): "my new card still hasn't arrived", ...
  Exact pattern: (my card hasn't come in the mail yet|my new card still hasn't arrived|when will my new card be delivered\?)

grex generates two types of patterns:

Exact pattern — alternation of all seen strings (always included)
Structural pattern — generalized version with digit/repetition conversion (included when it's meaningfully shorter than the exact pattern)

For NER and transformation tasks with structured values (dates, IDs, codes), the structural pattern is especially valuable:

- DATE (5 unique): "2024-01-15", "2024-02-28", "2023-12-01", ...
  Exact pattern: (2023\-12\-01|2024\-01\-15|2024\-02\-28|...)
  Structural pattern: \d{4}\-\d{2}\-\d{2}

The structural pattern \d{4}\-\d{2}\-\d{2} tells the LLM to write a general date regex rather than hardcoding the specific dates.

When grex helps most¶

Structured extraction — dates, phone numbers, IDs, codes, amounts
NER — entity strings with consistent patterns (drug names, gene symbols)
Classification with keyword clusters — groups of similar input phrases

When grex doesn't help¶

Very long input strings (>80 chars are skipped)
Fewer than 2 unique strings per group
Highly diverse strings with no shared structure (exact pattern becomes a giant alternation that the LLM ignores)

Debugging¶

Set the environment variable to see when grex is used:

RULECHEF_GREX_LOG=1 python your_script.py

This prints lines like [rulechef][grex] used CLASSIFICATION:exchange_rate whenever a pattern is generated.

Code Rule Security¶

Code rules (RuleFormat.CODE) are executed via Python's exec() in a restricted namespace. The default __builtins__ are replaced with a curated safe subset, so code rules cannot import modules, access the filesystem, or execute arbitrary code.

This means code rules can use:

re — the standard library regex module
Span — RuleChef's span dataclass for returning results
Safe builtins — len, str, int, float, bool, list, dict, set, tuple, range, enumerate, zip, map, filter, sorted, reversed, min, max, sum, any, all, abs, round, isinstance, type, print
Basic Python syntax (loops, conditionals, string methods, list comprehensions)

Code rules cannot:

Import modules (import os, __import__('subprocess'))
Access files or environment variables
Make network calls
Call open(), eval(), exec(), getattr(), or compile()

If you need capabilities beyond this, use regex or spaCy rules instead.

CLI¶

Interactive CLI for quick experimentation:

export OPENAI_API_KEY=your_key
rulechef