Skip to content

ResultMessage.usage is missing thinking token breakdown #540

@morluto

Description

@morluto

The Problem

Right now ResultMessage.usage lumps thinking tokens into output_tokens, but thinking tokens are actually billed differently from regular output. This makes it impossible to accurately
track costs.

When I set MAX_THINKING_TOKENS = 8000, I'm effectively authorizing up to $0.12 in thinking tokens per turn ($15/M), but I can't tell from the usage data how much of that was actually
used vs regular output.

What I'm Seeing

Currently receiving:

{
"input_tokens": 9,
"output_tokens": 112, # thinking tokens mixed in here
"cache_read_input_tokens": 13916,
"cache_creation_input_tokens": 11227
}

What I'd Expect

Since the Anthropic API returns thinking counts separately, would be great to see them broken out:

{
"input_tokens": 9,
"output_tokens": 42,
"thinking_tokens": 70, # or separate thinking_input/thinking_output if available
"cache_read_input_tokens": 13916,
"cache_creation_input_tokens": 11227
}

Why This Matters

  • Thinking tokens can easily be 50%+ of a run's cost
  • When debugging expensive runs, it's hard to know if the model is thinking too much or outputting too much
  • Different pricing should be visible in the usage stats

The raw API already exposes this, it's just a matter of surfacing it through the SDK's usage object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions