Skip to content

Multi-model LLM serving for NVIDIA DGX Spark with vLLM, web UI, and tool calling

License

Notifications You must be signed in to change notification settings

dataforgex/dgx_spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DGX Spark - Multi-Model LLM Serving

License Website

Local LLM infrastructure for DGX Spark (GB10 Blackwell) with vLLM, web UI, and model management. Works with 1 or 2 DGX Sparks.

⭐ If you find this repo useful, please give it a star - that's all I ask. Thanks! :D

Quick Start

./start-all.sh

Then open the Dashboard at http://localhost:5173 and start a model.

Chat: http://localhost:5173/chat

To stop all services: ./start-all.sh --stop

Features

  • Web Dashboard - Start/stop models, GPU monitoring, chat interface
  • 7 Models - Code, vision, reasoning, 235B distributed
  • Tool Calling - Web search + sandboxed code execution
  • OpenAI API - Compatible endpoints on ports 8100-8235

Screenshots

Dashboard

Dashboard

Chat

Chat

Models

Model Port Best For
Qwen3-Coder-30B-AWQ 8104 Code + tools (recommended)
Qwen3-235B-AWQ 8235 Large tasks (2-node)
Qwen2-VL-7B 8101 Vision
Nemotron-3-Nano-30B 8105 Reasoning

Technical Reference

For Claude Code and developers

Services

Start all services: ./start-all.sh (recommended)

Service Port Manual Start
Web GUI 5173 cd web-gui && ./start-docker.sh
Model Manager 5175 cd model-manager && ./serve.sh
Tool Sandbox 5176 cd tool-call-sandbox && ./serve.sh
SearXNG 8080 cd searxng-docker && docker compose up -d

Key Files

  • models.yaml - All model configurations
  • shared/auth.py - API authentication (Bearer token via DGX_API_KEY)
  • vllm-*/serve.sh - Model startup scripts

Environment Variables

Variable Purpose
DGX_API_KEY Enable API authentication
DGX_RATE_LIMIT Requests/min per IP (default: 60)
DGX_LOG_LEVEL Log level: debug, info, warning, error (default: info)
HF_TOKEN HuggingFace access token

Runtime Configuration

# Check current log level
curl http://localhost:5175/api/config/log-level

# Enable debug logging (no restart needed)
curl -X POST http://localhost:5175/api/config/log-level \
  -H "Content-Type: application/json" -d '{"level": "debug"}'

Architecture

  • Frontend: React + Vite (web-gui/)
  • APIs: FastAPI with shared auth middleware
  • Models: vLLM in Docker with CORS enabled
  • Sandbox: Seccomp + capabilities + non-root execution