Skip to content

RuleChef

Splitting

KRLabsOrg/rulechef

Splitting¶

Stratified train/development holdout used by the refinement loop to decide patch acceptance on data the rules were not tuned on. See Holdout Acceptance for the conceptual overview.

split_dataset¶

`split_dataset(dataset, holdout_fraction=0.2, seed=42, min_dev_size=5)` ¶

Split a dataset into train and held-out dev portions.

Examples are split stratified by class signature. Corrections always stay in train: they are explicit user fixes that must drive patching, and holding them out would hide the highest-value signal from the learner. Feedback and existing rules are shared with the train split.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	Source dataset (not mutated).	required
`holdout_fraction`	`float`	Fraction of examples to hold out (0 < f < 1).	`0.2`
`seed`	`int`	Random seed for the shuffle within each stratum.	`42`
`min_dev_size`	`int`	If the resulting dev set would be smaller than this, no split is performed and (dataset, None) is returned.	`5`

Returns:

Type	Description
`Dataset`	Tuple of (train_dataset, dev_dataset). dev_dataset is None when the
`Dataset \| None`	dataset is too small to split safely.