Skip to content

Conversation

@JulienDeveaux
Copy link
Contributor

What problem does this PR solve?

The native chat completions endpoint (/chats/<chat_id>/completions) was not returning token usage information,
making it impossible for clients to track token consumption for billing and quota management.

This PR adds token usage reporting to the native endpoint with two key improvements:

  1. Basic token usage reporting: Returns usage object with prompt_tokens, completion_tokens, and total_tokens in the final streaming chunk

  2. Accurate LLM API usage: Uses stream_options={"include_usage": True} to request real token counts from OpenAI-compatible LLM APIs. This provides accurate counts that include system prompts, chat history, and RAG context - not just the user message and response.

Changes:

  • Add usage field to final streaming response in native endpoint
  • Request real usage data from LLM API via stream_options.include_usage
  • Capture and propagate full token breakdown (prompt/completion/total) through the response chain
  • Fall back to tiktoken counting when LLM API doesn't provide usage
  • Add unit tests for token usage in native endpoint

Fixes #7850

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 💞 feature Feature request, pull request that fullfill a new feature. labels Jan 21, 2026
@JulienDeveaux
Copy link
Contributor Author

I added the same fix recommended in the first review comment of #12760

@JulienDeveaux
Copy link
Contributor Author

Friendly push @KevinHuSh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' 💞 feature Feature request, pull request that fullfill a new feature. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question]: In the HTTP API, how can the usage of tokens be obtained through the chat interface?

1 participant