multi-head-latent-attention

Star

Here are 3 public repositories matching this topic...

bonginn / vlm-mla

Star

Efficient Vision-Language Model via Multi-head Latent Attention

memory-efficient kv-cache llava multi-head-latent-attention

Updated Dec 30, 2025
Python

AnkitaMungalpara / Building-DeepSeek-From-Scratch

Star

This repository shows how to build a DeepSeek language model from scratch using PyTorch. It includes clean, well-structured implementations of advanced attention techniques such as key–value caching for fast decoding, multi-query attention, grouped-query attention, and multi-head latent attention.

transformers pytorch multi-query-attention grouped-query-attention multi-head-latent-attention deepseek-from-scratch

Updated Jan 10, 2026
Jupyter Notebook

luciITby / OpenLabLM

Star

🚀 Build your own LLM easily with OpenLabLM, a lightweight, hackable codebase tailored for hobbyists using a single consumer GPU.

nlp machine-learning deep-neural-networks ai deep-learning pytorch muon natural-language-generation language-model natural-language-understanding pytorch-implementation large-language-models llm generative-ai rmsnorm multi-head-latent-attention

Updated Feb 16, 2026
Python

Improve this page

Add a description, image, and links to the multi-head-latent-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-head-latent-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-head-latent-attention

Here are 3 public repositories matching this topic...

bonginn / vlm-mla

AnkitaMungalpara / Building-DeepSeek-From-Scratch

luciITby / OpenLabLM

Improve this page

Add this topic to your repo