multi-query-attention

Here are 5 public repositories matching this topic...

knotgrass / attention

several types of attention modules written in PyTorch for learning purposes

transformers pytorch transformer attention attention-mechanism softmax-layer multi-head-attention multi-query-attention grouped-query-attention scale-dot-product-attention

Updated Jan 2, 2026
Python

M-e-r-c-u-r-y / pytorch-transformers

Star

Collection of different types of transformers for learning purposes

transformers pytorch multi-head-attention einsum-notation multi-query-attention

Updated Jan 30, 2020
Jupyter Notebook

krik8235 / ml-gqa-transformer

Star

Examine cost-effective methods for optimizing GQA configurations, comparing the performance with its counterparts like Multi-Head Attention (MHA) and Multi-Query Attention (MQA).

transformer multi-head-attention multi-query-attention grouped-query-attention

Updated Nov 21, 2025
Jupyter Notebook

AnkitaMungalpara / Building-DeepSeek-From-Scratch

Star

This repository shows how to build a DeepSeek language model from scratch using PyTorch. It includes clean, well-structured implementations of advanced attention techniques such as key–value caching for fast decoding, multi-query attention, grouped-query attention, and multi-head latent attention.

transformers pytorch multi-query-attention grouped-query-attention multi-head-latent-attention deepseek-from-scratch

Updated Jan 10, 2026
Jupyter Notebook

JonSnow1807 / FastMQA

Star

CUDA implementation of Multi-Query Attention achieving 97% KV-cache memory reduction for LLM inference, enabling 32x larger batch sizes. Educational project demonstrating CUDA kernel development with PyTorch integration and Llama model benchmarks.

cuda attention-mechanism gpu-programming multi-query-attention llm-inference

Updated Sep 10, 2025
Python

Improve this page

Add a description, image, and links to the multi-query-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-query-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-query-attention

Here are 5 public repositories matching this topic...

knotgrass / attention

M-e-r-c-u-r-y / pytorch-transformers

krik8235 / ml-gqa-transformer

AnkitaMungalpara / Building-DeepSeek-From-Scratch

JonSnow1807 / FastMQA

Improve this page

Add this topic to your repo