Skip to content

Squeez

Config

KRLabsOrg/squeez

Config¶

Pipeline configuration and constants.

PipelineConfig(output_dir=Path('data'), source_cache_dir=Path('data/source_cache'), repos_dir=Path('data/repos'), github_token='', openai_api_key='', distillation_model='gpt-5.4', distillation_base_url=None, swebench_dataset='princeton-nlp/SWE-bench', splits=(lambda: ['test'])(), max_instances=None, min_tools_per_instance=3, max_tools_per_instance=7, max_tool_output_lines=MAX_TOOL_OUTPUT_LINES, distillation_max_concurrent=50, distillation_temperature=0.3, generate_queries_with_teacher=True, command_timeout=30) `dataclass` ¶

Configuration for the data generation pipeline.

Constants¶

`config` ¶

Configuration for the data generation pipeline.

`SYSTEM_PROMPT = 'You prune verbose tool output for a coding agent. Given a focused extraction query and one tool output, return only the smallest verbatim evidence block(s) the agent should read next. Return the kept text inside <relevant_lines> tags. Do not rewrite, summarize, or invent lines.'` `module-attribute` ¶

`TOOL_WEIGHTS = {'read_file': 0.28, 'grep': 0.18, 'python': 0.08, 'git_log': 0.08, 'test_output': 0.08, 'git_diff': 0.05, 'git_blame': 0.04, 'ls': 0.04, 'lint_output': 0.02, 'build_output': 0.02, 'curl': 0.03, 'pip_install': 0.04, 'type_check': 0.04, 'coverage': 0.02}` `module-attribute` ¶

`MIN_RELEVANT_RATIO = 0.02` `module-attribute` ¶

`MAX_RELEVANT_RATIO = 0.4` `module-attribute` ¶

`MIN_RELEVANT_LINES = 3` `module-attribute` ¶

`MIN_TOTAL_LINES = 10` `module-attribute` ¶

`MAX_TOOL_OUTPUT_LINES = 500` `module-attribute` ¶