π VGGT (Visual Geometry Grounded Transformer) optimized for Apple Silicon with Metal Performance Shaders (MPS)
Transform single or multi-view images into rich 3D reconstructions using Facebook Research's VGGT model, now accelerated on M1/M2/M3 Macs.
Major Update: Complete packaging overhaul with unified CLI, PyPI-ready distribution, and production-grade tooling!
- Unified CLI: New
vggtcommand with subcommands for all operations - Professional Packaging: PyPI-ready with
pyproject.toml, proper src layout - Web Interface: Gradio UI for interactive 3D reconstruction (
vggt web) - Enhanced Testing: Comprehensive test suite with MPS and sparse attention tests
- Modern Tooling: UV support, Makefile automation, GitHub Actions CI/CD
- MPS Acceleration: Full GPU acceleration on Apple Silicon using Metal Performance Shaders
- β‘ Sparse Attention: O(n) memory scaling for city-scale reconstruction (100x savings!)
- π₯ Multi-View 3D Reconstruction: Generate depth maps, point clouds, and camera poses from images
- π§ MCP Integration: Model Context Protocol server for Claude Desktop integration
- π¦ 5GB Model: Efficient 1B parameter model that runs smoothly on Apple Silicon
- π οΈ Multiple Export Formats: PLY, OBJ, GLB for 3D point clouds
VGGT reconstructs 3D scenes from images by predicting:
- Depth Maps: Per-pixel depth estimation
- Camera Poses: 6DOF camera parameters
- 3D Point Clouds: Dense 3D reconstruction
- Confidence Maps: Reliability scores for predictions
- Apple Silicon Mac (M1/M2/M3)
- Python 3.10+
- 8GB+ RAM
- 6GB disk space for model
# Install from PyPI (when published)
pip install vggt-mps
# Download model weights (5GB)
vggt downloadgit clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps
# Install with uv (10-100x faster than pip!)
make install
# Or manually with uv
uv pip install -e .git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps
# Create virtual environment
python -m venv vggt-env
source vggt-env/bin/activate
# Install dependencies
pip install -r requirements.txt# Download the 5GB VGGT model
vggt download
# Or if running from source:
python main.py downloadOr manually download from Hugging Face
# Test MPS acceleration
vggt test --suite mps
# Or from source:
python main.py test --suite mpsExpected output:
β
MPS (Metal Performance Shaders) available!
Running on Apple Silicon GPU
β
Model weights loaded to mps
β
MPS operations working correctly!
# Copy environment configuration
cp .env.example .env
# Edit .env with your settings
nano .envAll functionality is accessible through the unified vggt command:
# Quick demo with sample images
vggt demo
# Demo with kitchen dataset (4 images)
vggt demo --kitchen --images 4
# Process your own images
vggt reconstruct data/*.jpg
# Use sparse attention for large scenes
vggt reconstruct --sparse data/*.jpg
# Export to specific format
vggt reconstruct --export ply data/*.jpg
# Launch interactive web interface
vggt web
# Open on specific port with public link
vggt web --port 8080 --share
# Run comprehensive tests
vggt test --suite all
# Test sparse attention specifically
vggt test --suite sparse
# Benchmark performance
vggt benchmark --compare
# Download model weights
vggt downloadIf running from source without installation:
python main.py demo
python main.py reconstruct data/*.jpg
python main.py web
python main.py test --suite mps
python main.py benchmark --compare- Edit
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"vggt-agent": {
"command": "uv",
"args": [
"run",
"--python",
"/path/to/vggt-mps/vggt-env/bin/python",
"--with",
"fastmcp",
"fastmcp",
"run",
"/path/to/vggt-mps/src/vggt_mps_mcp.py"
]
}
}
}- Restart Claude Desktop
vggt_quick_start_inference- Quick 3D reconstruction from imagesvggt_extract_video_frames- Extract frames from videovggt_process_images- Full VGGT pipelinevggt_create_3d_scene- Generate GLB 3D filesvggt_reconstruct_3d_scene- Multi-view reconstructionvggt_visualize_reconstruction- Create visualizations
vggt-mps/
βββ main.py # Single entry point
βββ setup.py # Package installation
βββ requirements.txt # Dependencies
βββ .env.example # Environment configuration
β
βββ src/ # Source code
β βββ config.py # Centralized configuration
β βββ vggt_core.py # Core VGGT processing
β βββ vggt_sparse_attention.py # Sparse attention (O(n) scaling)
β βββ visualization.py # 3D visualization utilities
β β
β βββ commands/ # CLI commands
β β βββ demo.py # Demo command
β β βββ reconstruct.py # Reconstruction command
β β βββ test_runner.py # Test runner
β β βββ benchmark.py # Performance benchmarking
β β βββ web_interface.py # Gradio web app
β β
β βββ utils/ # Utilities
β βββ model_loader.py # Model management
β βββ image_utils.py # Image processing
β βββ export.py # Export to PLY/OBJ/GLB
β
βββ tests/ # Organized test suite
β βββ test_mps.py # MPS functionality tests
β βββ test_sparse.py # Sparse attention tests
β βββ test_integration.py # End-to-end tests
β
βββ data/ # Input data directory
βββ outputs/ # Output directory
βββ models/ # Model storage
β
βββ docs/ # Documentation
β βββ API.md # API documentation
β βββ SPARSE_ATTENTION.md # Technical details
β βββ BENCHMARKS.md # Performance results
β
βββ LICENSE # MIT License
from src.tools.readme import vggt_quick_start_inference
result = vggt_quick_start_inference(
image_directory="./tmp/inputs",
device="mps", # Use Apple Silicon GPU
max_images=4,
save_outputs=True
)from src.tools.demo_gradio import vggt_extract_video_frames
result = vggt_extract_video_frames(
video_path="input_video.mp4",
frame_interval_seconds=1.0
)from src.tools.demo_viser import vggt_reconstruct_3d_scene
result = vggt_reconstruct_3d_scene(
images_dir="./tmp/inputs",
device_type="mps",
confidence_threshold=0.5
)City-scale 3D reconstruction is now possible! We've implemented Gabriele Berton's research idea for O(n) memory scaling.
- 100x memory savings for 1000 images
- No retraining required - patches existing VGGT at runtime
- Identical outputs to regular VGGT (0.000000 difference)
- MegaLoc covisibility detection for smart attention masking
from src.vggt_sparse_attention import make_vggt_sparse
# Convert any VGGT to sparse in 1 line
sparse_vggt = make_vggt_sparse(regular_vggt, device="mps")
# Same usage, O(n) memory instead of O(nΒ²)
output = sparse_vggt(images) # Handles 1000+ images!| Images | Regular | Sparse | Savings |
|---|---|---|---|
| 100 | O(10K) | O(1K) | 10x |
| 500 | O(250K) | O(5K) | 50x |
| 1000 | O(1M) | O(10K) | 100x |
See full results: docs/SPARSE_ATTENTION_RESULTS.md
- Device Detection: Auto-detects MPS availability
- Dtype Selection: Uses float32 for optimal MPS performance
- Autocast Handling: CUDA autocast disabled for MPS
- Memory Management: Efficient tensor operations on Metal
- Parameters: 1B (5GB on disk)
- Input: Multi-view images
- Output: Depth, camera poses, 3D points
- Resolution: 518x518 (VGGT), up to 1024x1024 (input)
# Check PyTorch MPS support
python -c "import torch; print(torch.backends.mps.is_available())"# Verify model file
ls -lh repo/vggt/vggt_model.pt
# Should show ~5GB file- Reduce batch size
- Lower resolution
- Use CPU fallback
- Development Guide - Setting up your dev environment
- Publishing Guide - PyPI release process
- Contributing Guide - How to contribute
- API Documentation - Detailed API reference
- Examples - Code examples and demos
- β¨ Unified CLI with
vggtcommand - π¦ Professional Python packaging (PyPI-ready)
- π Gradio web interface
- π§ͺ Comprehensive test suite
- π οΈ Modern tooling (UV, Makefile, GitHub Actions)
- π Complete documentation overhaul
See full changelog
We follow a lightweight Git Flow:
mainholds the latest stable release and is protected.developis the default integration branch for day-to-day work.
When contributing:
- Create your feature branch from
develop(git switch develop && git switch -c feature/my-change). - Keep commits focused and include tests or documentation updates when relevant.
- Open your pull request against
develop; maintainers will promote changes tomainduring releases.
Please open issues for bugs or feature requests before starting large efforts. Full details, testing expectations, and the release process live in CONTRIBUTING.md.
MIT License - See LICENSE file for details
- Facebook Research for VGGT
- Apple for Metal Performance Shaders
- PyTorch team for MPS backend
Made with π for Apple Silicon by the AI community