LLM inference, chat UI, voice agents, workflow automation, RAG, image generation, and privacy tools — all running on your hardware. No cloud. No subscriptions. No configuration.
New here? Read the Friendly Guide or listen to the audio version — a complete walkthrough of what Dream Server is, how it works, and how to make it your own. No technical background needed.
Platform Support — March 2026
Platform Status Linux (NVIDIA + AMD) Supported — install and run today macOS (Apple Silicon) Coming soon — target mid-March 2026 Windows Coming soon — target end of March 2026 macOS and Windows installers currently provide system diagnostics and preflight checks only. Full runtime support for both platforms is in active development. For a working setup today, use Linux. See the Support Matrix for details.
Setting up local AI usually means stitching together a dozen projects, debugging CUDA drivers, writing Docker configs, and hoping everything talks to each other. Dream Server replaces all of that with a single installer.
- Run one command — the installer detects your GPU, picks the right model for your hardware, generates secure credentials, and launches everything
- Chat in under 2 minutes — bootstrap mode starts a small model instantly while your full model downloads in the background
- 13 integrated services — chat, agents, voice, workflows, search, RAG, image generation, and more, all pre-wired and working together
- Fully moddable — drop in a folder, run
dream enable, done. Every service is an extension
curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/dream-server/get-dream-server.sh | bashOpen http://localhost:3000 and start chatting.
Manual install (Linux)
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shmacOS / Windows (coming soon — not yet functional)
Full runtime support for macOS and Windows is on the roadmap (see platform table above). The installers below currently run preflight diagnostics only — they will check your system but will not produce a working AI stack yet.
macOS (Apple Silicon) — target mid-March 2026:
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.sh # Runs preflight checks; full runtime coming soonWindows (PowerShell) — target end of March 2026:
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/install.ps1" -OutFile install.ps1
.\install.ps1 # Runs WSL2/Docker/GPU preflight checks; full runtime coming soonFor a working setup today, use Linux.
- Open WebUI — full-featured chat interface with conversation history, web search, and document upload
- llama-server — high-performance LLM inference with continuous batching, auto-selected for your GPU
- LiteLLM — API gateway supporting local/cloud/hybrid modes
- Whisper — speech-to-text
- Kokoro — text-to-speech
- OpenClaw — autonomous AI agent framework
- n8n — workflow automation with 400+ integrations (Slack, email, databases, APIs)
- Qdrant — vector database for retrieval-augmented generation (RAG)
- SearXNG — self-hosted web search (no tracking)
- Perplexica — deep research engine
- ComfyUI — node-based image generation
- Privacy Shield — PII scrubbing proxy for API calls
- Dashboard — real-time GPU metrics, service health, model management
The installer detects your GPU and picks the optimal model automatically. No manual configuration.
| VRAM | Model | Example GPUs |
|---|---|---|
| 8–11 GB | Qwen 2.5 7B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen 2.5 14B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen 2.5 32B (Q4_K_M) | RTX 4090, A6000 |
| 40+ GB | Qwen 2.5 72B (Q4_K_M) | A100, multi-GPU |
| 90+ GB | Qwen3 Coder Next 80B MoE | Multi-GPU A100/H100 |
| Unified RAM | Model | Hardware |
|---|---|---|
| 64–89 GB | Qwen3 30B-A3B (30B MoE) | Ryzen AI MAX+ 395 (64GB) |
| 90+ GB | Qwen3 Coder Next (80B MoE) | Ryzen AI MAX+ 395 (96GB) |
Override tier selection: ./install.sh --tier 3
No staring at download bars. Dream Server uses bootstrap mode by default:
- Downloads a tiny 1.5B model in under a minute
- You start chatting immediately
- The full model downloads in the background
- Hot-swap to the full model when it's ready — zero downtime
Skip bootstrap: ./install.sh --no-bootstrap
Dream Server is designed to be modded. Every service is an extension — a folder with a manifest.yaml and a compose.yaml. The dashboard, CLI, health checks, and compose stack all discover extensions automatically.
extensions/services/
my-service/
manifest.yaml # Metadata: name, port, health endpoint, GPU backends
compose.yaml # Docker Compose fragment (auto-merged into the stack)
dream enable my-service # Enable it
dream disable my-service # Disable it
dream list # See everythingThe installer itself is modular — 6 libraries and 13 phases, each in its own file. Want to add a hardware tier, swap a default model, or skip a phase? Edit one file.
Full extension guide | Installer architecture
The dream CLI manages your entire stack:
dream status # Health checks + GPU status
dream list # All services and their state
dream logs llm # Tail logs (aliases: llm, stt, tts)
dream restart [service] # Restart one or all services
dream start / stop # Start or stop the stack
dream mode cloud # Switch to cloud APIs via LiteLLM
dream mode local # Switch back to local inference
dream mode hybrid # Local primary, cloud fallback
dream model swap T3 # Switch to a different hardware tier
dream enable n8n # Enable an extension
dream disable whisper # Disable one
dream config show # View .env (secrets masked)
dream preset save gaming # Snapshot current config
dream preset load gaming # Restore it| Dream Server | Ollama + Open WebUI | LocalAI | |
|---|---|---|---|
| One-command full-stack install | LLM + agents + workflows + RAG + voice + images | LLM + chat only | LLM only |
| Hardware auto-detect + model selection | NVIDIA + AMD Strix Halo | No | No |
| AMD APU unified memory support | ROCm + llama-server | Partial (Vulkan) | No |
| Autonomous AI agents | OpenClaw | No | No |
| Workflow automation | n8n (400+ integrations) | No | No |
| Voice (STT + TTS) | Whisper + Kokoro | No | No |
| Image generation | ComfyUI | No | No |
| RAG pipeline | Qdrant + embeddings | No | No |
| Extension system | Manifest-based, hot-pluggable | No | No |
| Multi-GPU | Yes (NVIDIA) | Partial | Partial |
| Quickstart | Step-by-step install guide with troubleshooting |
| Hardware Guide | What to buy, tier recommendations |
| FAQ | Common questions and configuration |
| Extensions | How to add custom services |
| Installer Architecture | Modular installer deep dive |
| Changelog | Version history and release notes |
| Contributing | How to contribute |
Dream Server exists because of the incredible people, projects, and communities that make open-source AI possible. We are grateful to every contributor, maintainer, and tinkerer whose work powers this stack.
Thanks to kyuz0 for amd-strix-halo-toolboxes — pre-built ROCm containers for Strix Halo that saved us a lot of pain from having to build our own. And to lhl for strix-halo-testing — the foundational Strix Halo AI research and rocWMMA performance work that the broader community builds on.
- llama.cpp (ggerganov) — LLM inference engine
- Qwen (Alibaba Cloud) — Default language models
- Open WebUI — Chat interface
- ComfyUI — Image generation engine
- FLUX.1 (Black Forest Labs) — Image generation model
- AMD ROCm — GPU compute platform
- AMD Strix Halo Toolboxes (kyuz0) — Pre-built ROCm containers for AMD inference
- Strix Halo Testing (lhl) — Foundational Strix Halo AI research and rocWMMA optimizations
- n8n — Workflow automation
- Qdrant — Vector database
- SearXNG — Privacy-respecting search
- Perplexica — AI-powered search
- LiteLLM — LLM API gateway
- Kokoro FastAPI (remsky) — Text-to-speech
- Speaches — Speech-to-text
- Strix Halo Home Lab — Community knowledge base
If we missed anyone, open an issue. We want to get this right.
Apache 2.0 — Use it, modify it, ship it. See LICENSE.
Built by Light Heart Labs and The Collective


