Skip to content

Config

Pipeline configuration and constants.

PipelineConfig(output_dir=Path('data'), source_cache_dir=Path('data/source_cache'), repos_dir=Path('data/repos'), github_token='', openai_api_key='', distillation_model='gpt-5.4', distillation_base_url=None, swebench_dataset='princeton-nlp/SWE-bench', splits=(lambda: ['test'])(), max_instances=None, min_tools_per_instance=3, max_tools_per_instance=7, max_tool_output_lines=MAX_TOOL_OUTPUT_LINES, distillation_max_concurrent=50, distillation_temperature=0.3, command_timeout=30) dataclass

Configuration for the data generation pipeline.

Constants

config

Configuration for the data generation pipeline.

SYSTEM_PROMPT = 'You extract relevant lines from tool output for a coding task. Return the relevant lines inside <relevant_lines> tags, one per line. Include ONLY lines the agent needs to see.' module-attribute

TOOL_WEIGHTS = {'read_file': 0.28, 'grep': 0.18, 'python': 0.08, 'git_log': 0.08, 'test_output': 0.08, 'git_diff': 0.05, 'git_blame': 0.04, 'ls': 0.04, 'lint_output': 0.02, 'build_output': 0.02, 'curl': 0.03, 'pip_install': 0.04, 'type_check': 0.04, 'coverage': 0.02} module-attribute

MIN_RELEVANT_RATIO = 0.02 module-attribute

MAX_RELEVANT_RATIO = 0.4 module-attribute

MIN_RELEVANT_LINES = 3 module-attribute

MIN_TOTAL_LINES = 10 module-attribute

MAX_TOOL_OUTPUT_LINES = 500 module-attribute