Core Types¶

Data structures used throughout RuleChef.

TaskType¶

`TaskType` ¶

Bases: Enum

Type of task being performed.

EXTRACTION: Find text spans (untyped). Output: {"spans": [{"text", "start", "end"}]} NER: Find typed entities. Output: {"entities": [{"text", "start", "end", "type"}]} CLASSIFICATION: Classify input into a label. Output: {"label": "class_name"} TRANSFORMATION: Extract structured data to a custom output schema you define. Like GLiNER — define {"company": "str", "amount": "str"} and get exactly that.

RuleFormat¶

`RuleFormat` ¶

Bases: Enum

Rule representation formats

Task¶

`Task(name, description, input_schema, output_schema, type=TaskType.EXTRACTION, output_matcher=None, matching_mode='text', text_field=None)` `dataclass` ¶

Abstract task definition. Describes what we're trying to accomplish.

Attributes:

Name	Type	Description
`name`	`str`	Task name
`description`	`str`	Free text description
`input_schema`	`dict[str, str]`	Dict describing input fields
`output_schema`	`OutputSchema`	Dict or Pydantic model describing output fields. - Dict: Simple string descriptions (e.g., {"spans": "List[Span]"}) - Pydantic model: Full type validation with Literal labels
`type`	`TaskType`	TaskType enum (EXTRACTION, NER, CLASSIFICATION, TRANSFORMATION)
`output_matcher`	`OutputMatcher \| None`	Optional custom function to compare outputs. Signature: (expected: Dict, actual: Dict) -> bool If not provided, uses default matcher for the task type.
`matching_mode`	`Literal['text', 'exact']`	For extraction tasks, choose "text" (default) or "exact" to control how span matches are evaluated.
`text_field`	`str \| None`	Optional input key to use for regex/spaCy matching. If not set, the longest string field is used.

`get_labels(field_name='type')` ¶

Get label values from output schema.

For Pydantic schemas, extracts Literal values from the specified field. For dict schemas, returns empty list (labels not defined).

`validate_output(output)` ¶

Validate output against schema.

For Pydantic schemas, uses model validation. For dict schemas, returns (True, []) - no validation.

Returns:

Type	Description
`tuple[bool, list[str]]`	Tuple of (is_valid, list_of_error_messages)

`get_schema_for_prompt()` ¶

Render schema for inclusion in LLM prompts.

For Pydantic schemas, generates a readable representation with descriptions. For dict schemas, returns the dict as a string.

`to_dict()` ¶

Rule¶

`Rule(id, name, description, format, content, priority=5, confidence=0.5, times_applied=0, successes=0, failures=0, created_at=datetime.now(), output_template=None, output_key=None)` `dataclass` ¶

Learned extraction rule.

For schema-aware rules (NER, TRANSFORMATION), use output_template and output_key to control how matches are mapped to structured output. For legacy rules (EXTRACTION), content holds the pattern directly.

Attributes:

Name	Type	Description
`id`	`str`	Unique identifier.
`name`	`str`	Human-readable rule name (used for merge-by-name in patching).
`description`	`str`	What this rule matches or does.
`format`	`RuleFormat`	Rule format (REGEX, CODE, or SPACY).
`content`	`str`	Pattern string (regex, code, or JSON-encoded spaCy pattern). Also accessible via the `pattern` property.
`priority`	`int`	Execution priority (1-10, higher runs first).
`confidence`	`float`	Confidence score (0.0-1.0), adjusted based on success rate.
`times_applied`	`int`	Total number of times this rule has been applied.
`successes`	`int`	Number of successful applications.
`failures`	`int`	Number of failed applications.
`created_at`	`datetime`	When the rule was created.
`output_template`	`dict[str, Any] \| None`	JSON template for each match, using variables like $0, $1, $start, $end, $ent_type. None for plain span extraction.
`output_key`	`str \| None`	Which key in the output dict to populate (e.g. 'entities'). Inferred from task type if not set.

`pattern` `property` `writable` ¶

Alias for content - clearer semantics for regex/spaCy patterns

`update_stats(success)` ¶

Update performance stats and adjust confidence

`to_dict()` ¶

Span¶

`Span(text, start, end, score=1.0)` `dataclass` ¶

A text span with character-level position information.

Attributes:

Name	Type	Description
`text`	`str`	The matched text content.
`start`	`int`	Start character offset (inclusive) in the source string.
`end`	`int`	End character offset (exclusive) in the source string.
`score`	`float`	Confidence score for the match, between 0.0 and 1.0.

`overlaps(other)` ¶

Check if spans overlap

`overlap_ratio(other)` ¶

Calculate overlap ratio (IoU)

`to_dict()` ¶

Dataset¶

`Dataset(name, task, description='', examples=list(), corrections=list(), feedback=list(), structured_feedback=list(), rules=list(), version=1)` `dataclass` ¶

Complete training dataset containing examples, corrections, feedback, and rules.

Attributes:

Name	Type	Description
`name`	`str`	Dataset name, used as the persistence filename.
`task`	`Task`	Task definition describing the extraction/classification goal.
`description`	`str`	Optional human-readable description.
`examples`	`list[Example]`	List of labeled training examples.
`corrections`	`list[Correction]`	List of user corrections (highest-value training signal).
`feedback`	`list[str]`	Legacy list of plain-text feedback strings (task-level only).
`structured_feedback`	`list[Feedback]`	Structured feedback entries at task/example/rule level.
`rules`	`list[Rule]`	Learned rules (populated by learn_rules).
`version`	`int`	Dataset schema version for forward compatibility.

`get_all_training_data()` ¶

Get all examples and corrections combined

`get_feedback_for(level, target_id='')` ¶

Get feedback filtered by level and optional target.

`to_dict()` ¶

Example¶

`Example(id, input, expected_output, source, confidence=0.8, timestamp=datetime.now())` `dataclass` ¶

Regular training example. Lower priority than corrections.

Attributes:

Name	Type	Description
`id`	`str`	Unique identifier.
`input`	`dict[str, Any]`	Input data dict matching the task's input_schema.
`expected_output`	`dict[str, Any]`	Expected output dict matching the task's output_schema.
`source`	`str`	Origin of the example ('human_labeled' or 'llm_generated').
`confidence`	`float`	Confidence score for this example (0.0-1.0).
`timestamp`	`datetime`	When the example was created.

Correction¶

`Correction(id, input, model_output, expected_output, feedback=None, timestamp=datetime.now())` `dataclass` ¶

User correction -- the highest value training signal.

Contains both the wrong output and the correct output so the learner can understand what to fix.

Attributes:

Name	Type	Description
`id`	`str`	Unique identifier.
`input`	`dict[str, Any]`	Input data dict that was processed.
`model_output`	`dict[str, Any]`	The incorrect output that was produced.
`expected_output`	`dict[str, Any]`	The correct output the model should have produced.
`feedback`	`str \| None`	Optional free-text explanation of what went wrong.
`timestamp`	`datetime`	When the correction was created.

Feedback¶

`Feedback(id, text, level, target_id='', timestamp=datetime.now())` `dataclass` ¶

User feedback at any level: task, example, or rule.

task: general guidance ("drugs usually follow dosage like 'mg'")
example: feedback on a specific training item
rule: feedback on a specific rule ("too broad", "too specific")

Attributes:

Name	Type	Description
`id`	`str`	Unique identifier.
`text`	`str`	The feedback text.
`level`	`str`	Feedback scope -- 'task', 'example', or 'rule'.
`target_id`	`str`	Empty for task-level; example_id or rule_id otherwise.
`timestamp`	`datetime`	When the feedback was created.

Core Types¶

TaskType¶

TaskType ¶

RuleFormat¶

RuleFormat ¶

Task¶

Task(name, description, input_schema, output_schema, type=TaskType.EXTRACTION, output_matcher=None, matching_mode='text', text_field=None) dataclass ¶

get_labels(field_name='type') ¶

validate_output(output) ¶

get_schema_for_prompt() ¶

to_dict() ¶

Rule¶

Rule(id, name, description, format, content, priority=5, confidence=0.5, times_applied=0, successes=0, failures=0, created_at=datetime.now(), output_template=None, output_key=None) dataclass ¶

pattern property writable ¶

update_stats(success) ¶

to_dict() ¶

Span¶

Span(text, start, end, score=1.0) dataclass ¶

overlaps(other) ¶

overlap_ratio(other) ¶

to_dict() ¶

Dataset¶

Dataset(name, task, description='', examples=list(), corrections=list(), feedback=list(), structured_feedback=list(), rules=list(), version=1) dataclass ¶

get_all_training_data() ¶

get_feedback_for(level, target_id='') ¶

to_dict() ¶

Example¶

Example(id, input, expected_output, source, confidence=0.8, timestamp=datetime.now()) dataclass ¶

Correction¶

Correction(id, input, model_output, expected_output, feedback=None, timestamp=datetime.now()) dataclass ¶

Feedback¶

Feedback(id, text, level, target_id='', timestamp=datetime.now()) dataclass ¶

`TaskType` ¶

`RuleFormat` ¶

`Task(name, description, input_schema, output_schema, type=TaskType.EXTRACTION, output_matcher=None, matching_mode='text', text_field=None)` `dataclass` ¶

`get_labels(field_name='type')` ¶

`validate_output(output)` ¶

`get_schema_for_prompt()` ¶

`to_dict()` ¶

`Rule(id, name, description, format, content, priority=5, confidence=0.5, times_applied=0, successes=0, failures=0, created_at=datetime.now(), output_template=None, output_key=None)` `dataclass` ¶

`pattern` `property` `writable` ¶

`update_stats(success)` ¶

`to_dict()` ¶

`Span(text, start, end, score=1.0)` `dataclass` ¶

`overlaps(other)` ¶

`overlap_ratio(other)` ¶

`to_dict()` ¶

`Dataset(name, task, description='', examples=list(), corrections=list(), feedback=list(), structured_feedback=list(), rules=list(), version=1)` `dataclass` ¶

`get_all_training_data()` ¶

`get_feedback_for(level, target_id='')` ¶

`to_dict()` ¶

`Example(id, input, expected_output, source, confidence=0.8, timestamp=datetime.now())` `dataclass` ¶

`Correction(id, input, model_output, expected_output, feedback=None, timestamp=datetime.now())` `dataclass` ¶

`Feedback(id, text, level, target_id='', timestamp=datetime.now())` `dataclass` ¶