-
Notifications
You must be signed in to change notification settings - Fork 613
Description
The Problem
Right now ResultMessage.usage lumps thinking tokens into output_tokens, but thinking tokens are actually billed differently from regular output. This makes it impossible to accurately
track costs.
When I set MAX_THINKING_TOKENS = 8000, I'm effectively authorizing up to $0.12 in thinking tokens per turn ($15/M), but I can't tell from the usage data how much of that was actually
used vs regular output.
What I'm Seeing
Currently receiving:
{
"input_tokens": 9,
"output_tokens": 112, # thinking tokens mixed in here
"cache_read_input_tokens": 13916,
"cache_creation_input_tokens": 11227
}
What I'd Expect
Since the Anthropic API returns thinking counts separately, would be great to see them broken out:
{
"input_tokens": 9,
"output_tokens": 42,
"thinking_tokens": 70, # or separate thinking_input/thinking_output if available
"cache_read_input_tokens": 13916,
"cache_creation_input_tokens": 11227
}
Why This Matters
- Thinking tokens can easily be 50%+ of a run's cost
- When debugging expensive runs, it's hard to know if the model is thinking too much or outputting too much
- Different pricing should be visible in the usage stats
The raw API already exposes this, it's just a matter of surfacing it through the SDK's usage object.