Skip to content

fix: separate ContextWindow from MaxTokens configuration#848

Open
QuietyAwe wants to merge 1 commit intosipeed:mainfrom
QuietyAwe:fix/context-window-config
Open

fix: separate ContextWindow from MaxTokens configuration#848
QuietyAwe wants to merge 1 commit intosipeed:mainfrom
QuietyAwe:fix/context-window-config

Conversation

@QuietyAwe
Copy link

Problem

The current implementation incorrectly sets ContextWindow to MaxTokens value in pkg/agent/instance.go:

ContextWindow:  maxTokens,  // Wrong: maxTokens is output limit, not context window

This causes several issues:

  1. GLM-5 with 128K context window was incorrectly limited to 65K
  2. Compression threshold was underestimated by ~50%
  3. Forced compression dropped 50% of history without preserving summary

Solution

  • Add new ContextWindow field to AgentDefaults config
  • Default to 128K (most modern models support this)
  • Users can configure smaller values for models with limited context

Changes

  1. pkg/config/config.go: Add ContextWindow field to AgentDefaults
  2. pkg/agent/instance.go: Use separate ContextWindow value instead of MaxTokens

Configuration Example

{
  "agents": {
    "defaults": {
      "max_tokens": 8192,
      "context_window": 128000
    }
  }
}

Testing

  • Code compiles successfully
  • Backward compatible: existing configs without context_window will use 128K default

Related

This fixes the context window handling for large-context models like GLM-5, Claude, GPT-4-turbo, etc.

- Add ContextWindow field to AgentDefaults config
- ContextWindow now defaults to 128K (most modern models support this)
- Previously ContextWindow was incorrectly set to MaxTokens (output limit)
- This fixes premature context compression for large-context models like GLM-5 (128K context)

Fixes issue where:
- GLM-5 with 128K context was limited to 65K (MaxTokens value)
- Compression threshold was underestimated by 50%
- Forced compression dropped 50% of history without summary
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct and important fix. Setting ContextWindow = maxTokens meant that for models like GLM-5 (128K context, 65K output max), the compression threshold was anchored to 65K instead of 128K, triggering premature forced compression and discarding half the conversation history unnecessarily.

The fix properly separates the two concerns:

  • MaxTokens = output token limit (how many tokens the model can generate)
  • ContextWindow = input context limit (how many tokens can be sent)

Defaulting to 128K is reasonable for modern models. Users with smaller-context models can override via config.

The config field is properly tagged with both json and env attributes, so it can be set via JSON config or environment variable.

LGTM.

@QuietyAwe
Copy link
Author

Thanks for the thorough review @nikolasdehor! 🙏 Glad the fix addresses the issue correctly. Looking forward to getting this merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants