Feat: Add accurate token usage reporting for native chat completions endpoint #7850 #12761

JulienDeveaux · 2026-01-21T15:14:13Z

What problem does this PR solve?

The native chat completions endpoint (/chats/<chat_id>/completions) was not returning token usage information,
making it impossible for clients to track token consumption for billing and quota management.

This PR adds token usage reporting to the native endpoint with two key improvements:

Basic token usage reporting: Returns usage object with prompt_tokens, completion_tokens, and total_tokens in the final streaming chunk
Accurate LLM API usage: Uses stream_options={"include_usage": True} to request real token counts from OpenAI-compatible LLM APIs. This provides accurate counts that include system prompts, chat history, and RAG context - not just the user message and response.

Changes:

Add usage field to final streaming response in native endpoint
Request real usage data from LLM API via stream_options.include_usage
Capture and propagate full token breakdown (prompt/completion/total) through the response chain
Fall back to tiktoken counting when LLM API doesn't provide usage
Add unit tests for token usage in native endpoint

Fixes #7850

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

…s endpoint infiniflow#7850

JulienDeveaux · 2026-01-22T09:14:04Z

I added the same fix recommended in the first review comment of #12760

…token-usage

JulienDeveaux · 2026-02-02T08:53:28Z

Friendly push @KevinHuSh

JulienDeveaux added 3 commits January 21, 2026 15:04

Feat: Add token usage reporting to native /chats/<chat_id>/completion…

10b04fd

…s endpoint infiniflow#7850

Add test for token usage in native chat completions endpoint

60c7b5f

Feat: Get accurate token usage from LLM API for streaming responses

632c787

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 💞 feature Feature request, pull request that fullfill a new feature. labels Jan 21, 2026

JulienDeveaux added 2 commits January 22, 2026 10:06

Merge branch 'main' into feat/native-endpoint-token-usage

02ca0e8

fix: use num_tokens_from_string in place of tiktoken

62593b0

JulienDeveaux added 2 commits January 22, 2026 11:04

fix: handle dict token_usage in async_chat for native endpoint

81381ea

Merge remote-tracking branch 'origin/main' into feat/native-endpoint-…

871c461

…token-usage

merge fix

a84f8ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add accurate token usage reporting for native chat completions endpoint #7850 #12761

Feat: Add accurate token usage reporting for native chat completions endpoint #7850 #12761

Uh oh!

JulienDeveaux commented Jan 21, 2026

Uh oh!

JulienDeveaux commented Jan 22, 2026

Uh oh!

JulienDeveaux commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat: Add accurate token usage reporting for native chat completions endpoint #7850 #12761

Are you sure you want to change the base?

Feat: Add accurate token usage reporting for native chat completions endpoint #7850 #12761

Uh oh!

Conversation

JulienDeveaux commented Jan 21, 2026

What problem does this PR solve?

Type of change

Uh oh!

JulienDeveaux commented Jan 22, 2026

Uh oh!

JulienDeveaux commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant