vLLM with sm_70 (Volta) Support

TL;DR

docker pull ghcr.io/jajmangold/vllm-sm70:latest

What’s in This Repo?

A Docker image for running the latest vLLM on older NVIDIA GPUs with sm_70 compute capability (Volta architecture), including:

Tesla V100
Titan V
Quadro GV100
NVIDIA CMP 100-210 (mining GPUs)

This image is built to be feature-complete for inference on Volta, not a crippled fallback.

What You Actually Get (Important)

Despite running on Volta, this image includes the modern inference stack you care about:

✅ xFormers attention (CUTLASS-backed kernels where applicable)
✅ PyTorch SDPA (scaled dot-product attention fallback)
✅ bitsandbytes (bnb) for efficient quantized weights
✅ AutoRound for W4A16 / low-bit quantization workflows
✅ CUDA graphs (enabled by default in vLLM)
✅ Continuous batching and KV cache reuse (vLLM core features)

What you don’t get (hardware limits, not software):

❌ FlashAttention v2 (requires sm_80+)
❌ FP8 / Hopper-only kernels
❌ Marlin (Ampere+)

This is the best possible attention + quantization stack on Volta without rebuilding PyTorch.

Why This Exists

Newer official vLLM images and recent PyTorch releases increasingly drop or de-prioritize Volta (sm_70) support.

This project takes the pragmatic route:

Use a known-good prebuilt PyTorch image that still includes sm_70
Preserve xFormers + SDPA attention paths
Include bnb + AutoRound for modern quantized inference
Avoid PyTorch source builds, PEP-517 pain, and toolchain breakage
Focus on running inference on Volta, not fighting packaging

If you just want new vLLM versions to keep working on V100 / CMP 100-210 cards, this is the boring solution that works.

Base Stack

Base image: pytorch/pytorch:2.7.1-cuda12.8-cudnn9-runtime
CUDA: 12.8
cuDNN: 9
PyTorch: 2.7.1 (prebuilt, includes sm_70)
vLLM: latest (auto-built from upstream releases)
Attention backends:
- xFormers
- PyTorch SDPA
Quantization tooling:
- bitsandbytes
- AutoRound
Python: from base image

Pre-built Image

The latest version of vLLM is built nightly .

Pull the pre-built image from GitHub Container Registry:

docker pull ghcr.io/jajmangold/vllm-sm70:latest

This tag always tracks:

the newest upstream vLLM release
a Volta-compatible PyTorch base
a full inference feature set (xFormers, SDPA, bnb, AutoRound)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM with sm_70 (Volta) Support

TL;DR

What’s in This Repo?

What You Actually Get (Important)

Why This Exists

Base Stack

Pre-built Image

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jajmangold/vllm-sm70

Folders and files

Latest commit

History

Repository files navigation

vLLM with sm_70 (Volta) Support

TL;DR

What’s in This Repo?

What You Actually Get (Important)

Why This Exists

Base Stack

Pre-built Image

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages