fix: separate ContextWindow from MaxTokens configuration#848
fix: separate ContextWindow from MaxTokens configuration#848QuietyAwe wants to merge 1 commit intosipeed:mainfrom
Conversation
- Add ContextWindow field to AgentDefaults config - ContextWindow now defaults to 128K (most modern models support this) - Previously ContextWindow was incorrectly set to MaxTokens (output limit) - This fixes premature context compression for large-context models like GLM-5 (128K context) Fixes issue where: - GLM-5 with 128K context was limited to 65K (MaxTokens value) - Compression threshold was underestimated by 50% - Forced compression dropped 50% of history without summary
nikolasdehor
left a comment
There was a problem hiding this comment.
This is a correct and important fix. Setting ContextWindow = maxTokens meant that for models like GLM-5 (128K context, 65K output max), the compression threshold was anchored to 65K instead of 128K, triggering premature forced compression and discarding half the conversation history unnecessarily.
The fix properly separates the two concerns:
MaxTokens= output token limit (how many tokens the model can generate)ContextWindow= input context limit (how many tokens can be sent)
Defaulting to 128K is reasonable for modern models. Users with smaller-context models can override via config.
The config field is properly tagged with both json and env attributes, so it can be set via JSON config or environment variable.
LGTM.
|
Thanks for the thorough review @nikolasdehor! 🙏 Glad the fix addresses the issue correctly. Looking forward to getting this merged. |
Problem
The current implementation incorrectly sets
ContextWindowtoMaxTokensvalue inpkg/agent/instance.go:This causes several issues:
Solution
ContextWindowfield toAgentDefaultsconfigChanges
pkg/config/config.go: AddContextWindowfield toAgentDefaultspkg/agent/instance.go: Use separateContextWindowvalue instead ofMaxTokensConfiguration Example
{ "agents": { "defaults": { "max_tokens": 8192, "context_window": 128000 } } }Testing
context_windowwill use 128K defaultRelated
This fixes the context window handling for large-context models like GLM-5, Claude, GPT-4-turbo, etc.