Skip to content

[Analytics] Add token throughput metrics (tokens/second) #362

@PierreLeGuen

Description

@PierreLeGuen

Description

Track and expose token throughput metrics to measure inference performance.

Motivation

Tokens per second is a key performance metric for LLM inference. This helps identify:

  • Model performance characteristics
  • Infrastructure bottlenecks
  • Capacity planning needs

Proposed Solution

Add throughput metrics:

  • Tokens per second (output tokens / decoding time)
  • Average throughput by model
  • P50/P95/P99 throughput percentiles

Technical Details

  • Calculate from existing latency data: output_tokens / decoding_time_ms * 1000
  • Aggregate in analytics queries
  • Consider storing as pre-computed metric

Acceptance Criteria

  • Throughput metric available in API responses
  • Breakdown by model
  • Percentile distributions (P50, P95, P99)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions