Skip to content

[FEATURE SUPPORT] Add Triton GEMV kernel and performance tests#54

Merged
LoserCheems merged 4 commits intomainfrom
add-gemv-triton-kernel
Dec 4, 2025
Merged

[FEATURE SUPPORT] Add Triton GEMV kernel and performance tests#54
LoserCheems merged 4 commits intomainfrom
add-gemv-triton-kernel

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • Introduces a Triton GEMV kernel for efficient matrix-vector multiplication and adds performance tests comparing various implementations.

Root Cause

  • The need for an optimized GEMV operation on GPU prompted the development of this kernel.

Changes

  • Implemented an autotuned block-tiled GEMV kernel using Triton.
  • Reformatted code for improved readability.
  • Added parameterized performance tests for GEMV across different frameworks.

Reproduction

  • Not applicable as this is a new feature addition.

Tests

  • Added performance tests for the GEMV kernel covering CUDA, MPS, and multiple data types.

Compatibility

  • No migration concerns or backwards compatibility issues noted.

Checklist

Introduces an autotuned block-tiled matvec kernel so matrix–vector updates run on GPU with configurable scaling and launch grid.
Introduces parameterized GEMV benchmarks covering CUDA, MPS, and multiple dtypes to compare python, PyTorch, Triton, and CUTe implementations.
@LoserCheems LoserCheems merged commit c3b697d into main Dec 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants