ResultMessage.usage is missing thinking token breakdown

**The Problem**

  Right now ResultMessage.usage lumps thinking tokens into output_tokens, but thinking tokens are actually billed differently from regular output. This makes it impossible to accurately
  track costs.

  When I set MAX_THINKING_TOKENS = 8000, I'm effectively authorizing up to $0.12 in thinking tokens per turn ($15/M), but I can't tell from the usage data how much of that was actually
  used vs regular output.

  **What I'm Seeing**

  # Currently receiving:
  {
      "input_tokens": 9,
      "output_tokens": 112,  # thinking tokens mixed in here
      "cache_read_input_tokens": 13916,
      "cache_creation_input_tokens": 11227
  }

  **What I'd Expect**

  Since the Anthropic API returns thinking counts separately, would be great to see them broken out:

  {
      "input_tokens": 9,
      "output_tokens": 42,
      "thinking_tokens": 70,  # or separate thinking_input/thinking_output if available
      "cache_read_input_tokens": 13916,
      "cache_creation_input_tokens": 11227
  }

  **Why This Matters**

  - Thinking tokens can easily be 50%+ of a run's cost
  - When debugging expensive runs, it's hard to know if the model is thinking too much or outputting too much
  - Different pricing should be visible in the usage stats

  The raw API already exposes this, it's just a matter of surfacing it through the SDK's usage object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResultMessage.usage is missing thinking token breakdown #540

Currently receiving:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ResultMessage.usage is missing thinking token breakdown #540

Description

Currently receiving:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions