-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Track and expose token throughput metrics to measure inference performance.
Motivation
Tokens per second is a key performance metric for LLM inference. This helps identify:
- Model performance characteristics
- Infrastructure bottlenecks
- Capacity planning needs
Proposed Solution
Add throughput metrics:
- Tokens per second (output tokens / decoding time)
- Average throughput by model
- P50/P95/P99 throughput percentiles
Technical Details
- Calculate from existing latency data:
output_tokens / decoding_time_ms * 1000 - Aggregate in analytics queries
- Consider storing as pre-computed metric
Acceptance Criteria
- Throughput metric available in API responses
- Breakdown by model
- Percentile distributions (P50, P95, P99)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request