[FEATURE SUPPORT] Add Triton GEMV kernel and performance tests by LoserCheems · Pull Request #54 · flash-algo/kernel-course

LoserCheems · 2025-12-04T10:50:28Z

Summary

Introduces a Triton GEMV kernel for efficient matrix-vector multiplication and adds performance tests comparing various implementations.

Root Cause

The need for an optimized GEMV operation on GPU prompted the development of this kernel.

Changes

Implemented an autotuned block-tiled GEMV kernel using Triton.
Reformatted code for improved readability.
Added parameterized performance tests for GEMV across different frameworks.

Reproduction

Not applicable as this is a new feature addition.

Tests

Added performance tests for the GEMV kernel covering CUDA, MPS, and multiple data types.

Compatibility

No migration concerns or backwards compatibility issues noted.

Checklist

Linked issue provided ([FEATURE REQUEST] gemv Triton kernel implementation #22 [FEATURE REQUEST] gemv test coverage and fixtures #24)
Adds or updates tests
Updates docs if needed
No perf regressions

Introduces an autotuned block-tiled matvec kernel so matrix–vector updates run on GPU with configurable scaling and launch grid.

Introduces parameterized GEMV benchmarks covering CUDA, MPS, and multiple dtypes to compare python, PyTorch, Triton, and CUTe implementations.

LoserCheems added 4 commits December 4, 2025 18:45

Adds Triton GEMV kernel

912e818

Introduces an autotuned block-tiled matvec kernel so matrix–vector updates run on GPU with configurable scaling and launch grid.

Format code for better readability in gemv.py

5e70651

Adds GEMV performance tests

7304ff3

Introduces parameterized GEMV benchmarks covering CUDA, MPS, and multiple dtypes to compare python, PyTorch, Triton, and CUTe implementations.

Update README to reflect GEMV Triton kernel implementation

27b5143

github-actions bot assigned SNHuan Dec 4, 2025

LoserCheems merged commit c3b697d into main Dec 4, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE SUPPORT] Add Triton GEMV kernel and performance tests#54

[FEATURE SUPPORT] Add Triton GEMV kernel and performance tests#54
LoserCheems merged 4 commits intomainfrom
add-gemv-triton-kernel

LoserCheems commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LoserCheems commented Dec 4, 2025

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants