Skip to content

Comments

Prompt cache length default#61

Open
aidando73 wants to merge 3 commits intomainfrom
cursor/prompt-cache-length-default-bfb2
Open

Prompt cache length default#61
aidando73 wants to merge 3 commits intomainfrom
cursor/prompt-cache-length-default-bfb2

Conversation

@aidando73
Copy link
Contributor

Auto-compute prompt cache length in benchmark load test to enable caching.

Previously, --prompt-cache-max-len defaulted to 0, causing TranslationDataset to build an empty shared prefix (common_tokens=0). This meant prompt caching was effectively disabled even when --prompt-cache-max-pct was set. The fix changes the default of --prompt-cache-max-len to None and, when it's not explicitly provided but --prompt-cache-max-pct is, it calculates common_tokens as int(prompt_tokens * prompt_cache_max_pct / 100). This ensures that passing --prompt-cache-max-pct 100 --prompt-tokens 8192 now correctly generates a shared prefix for caching.


Open in Web Open in Cursor 

…_pct

When --prompt-cache-max-pct is set but --prompt-cache-max-len is not
explicitly provided, auto-compute common_tokens as
int(prompt_tokens * prompt_cache_max_pct / 100). This makes
'--prompt-cache-max-pct 100 --prompt-tokens 8192' do the intuitive
thing: build an 8192-token shared prefix so the server can actually
cache it.

Previously, --prompt-cache-max-len defaulted to 0, which meant
TranslationDataset always built an empty shared prefix regardless of
--prompt-cache-max-pct, defeating prompt caching.

Changed --prompt-cache-max-len default from 0 to None so we can
distinguish 'not provided' from 'explicitly set to 0'.

Co-authored-by: Aidan Do <aidando73@gmail.com>
@cursor
Copy link

cursor bot commented Feb 22, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@aidando73 aidando73 requested a review from a team February 22, 2026 14:34
help="Maximum number of shared prefix tokens across requests. "
"If not specified but --prompt-cache-max-pct is set, auto-computed as "
"int(prompt_tokens * prompt_cache_max_pct / 100). Defaults to 0 (no shared prefix) "
"when neither this nor --prompt-cache-max-pct is provided.",
Copy link

@cursor cursor bot Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The default for --prompt-cache-max-len is back to 0. The auto-computation logic now works as:

  • If --prompt-cache-max-len is explicitly set to a non-zero value, use it directly.
  • Otherwise, if --prompt-cache-max-pct is provided, auto-compute common_tokens = int(prompt_tokens * prompt_cache_max_pct / 100).
  • If neither is meaningfully set, common_tokens stays 0.

So --prompt-cache-max-pct 100 --prompt-tokens 8192 works on its own, producing common_tokens = 8192.

Open in Web Open in Cursor 

cursoragent and others added 2 commits February 22, 2026 14:36
Per review feedback, keep the default at 0 instead of None.
The auto-computation from --prompt-cache-max-pct now triggers when
prompt_cache_max_len is 0 (the default) and prompt_cache_max_pct is
set. An explicit non-zero --prompt-cache-max-len still takes
precedence.

Co-authored-by: Aidan Do <aidando73@gmail.com>
Cleaner idiom: use None to mean 'not provided' rather than
overloading 0. Behaviour is identical.

Co-authored-by: Aidan Do <aidando73@gmail.com>
@aidando73 aidando73 marked this pull request as ready for review February 22, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants