-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
1.1 thinking vs think
Parquet 训练数据(data/bfcl_train_base.parquet、data/bfcl_val.parquet)中全部 200 条 system prompt 使用 thinking /thinking 作为 XML 标签,但以下两处代码只匹配 think /think:
- env_tuning/interaction/utils.py:46 — parse_model_response() 正则 think ... /think, 不匹配 thinking
- env_tuning/interaction/execution_manager.py:110 — format_execution_response() 中 user_hint 使用 think /think
影响:模型按 system prompt 输出 thinking,parser 只匹配 think,所有输出在第一步即被判定格式错误(score = -3)
1.2 answer 标签未在 system prompt 中定义
System prompt 对不需要工具调用的情况指示为:
- "If no tool calls are necessary or possible: Directly provide a user-facing response in plain text."
但 env_tuning/interaction/utils.py:85-89 的 parse_model_response() 要求必须用 answer.../answer 包裹最终回答,否则返回格式错误。answer 仅在env_tuning/interaction/execution_manager.py:110-112 的 user_hint 中后续引入,模型在初始 turn 无从得知需要使用该标签。
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels