This repository provides a local, free, self-hosted setup for LLM evaluation using Langfuse and Ollama.
The goal is to:
- Experiment with prompts
- Compare multiple LLMs
- Run evaluations (scores, LLM-as-judge, datasets)
- Keep everything local and offline
This setup uses the official Langfuse OSS (version v3.148.0) with a local Ollama instance via the OpenAI-compatible API.
No Langfuse code changes, forks, or plugins are required.
This follows the recommended integration approach documented in: https://github.com/langfuse/langfuse
- Docker Desktop (WSL2 enabled on Windows)
- 16 GB RAM recommended
- Optional: NVIDIA GPU (for faster inference)
From the project root directory, run:
docker compose up -d
This starts:
- Langfuse (UI + workers)
- PostgreSQL, ClickHouse, Redis, MinIO
- Ollama (CPU-only by default)
Open your browser:
First-time setup:
- Create a user
- Create an organization
- Create a project
All Langfuse data is stored locally in Docker volumes.
Ollama runs inside Docker and exposes an OpenAI-compatible API.
You can interact with Ollama in TWO ways:
- Simple host command (recommended for most users)
- Explicit Docker command (useful if you also have native Ollama)
Both are shown below.
Simple command: ollama list
Docker-specific command: docker exec -it ollama ollama list
NOTE:
- If you have native Ollama installed AND Docker Ollama running, the docker command guarantees you are talking to the container.
Recommended models for evaluation:
Gemma 3 (4B) – primary eval / judge: docker exec -it ollama ollama pull gemma3:4b
Mistral 7B – comparison model: docker exec -it ollama ollama pull mistral
LLaMA 3.2 (3B) – reasoning comparison: docker exec -it ollama ollama pull llama3.2:3b
After pulling, verify:
ollama list or docker exec -it ollama ollama list
In Langfuse UI:
Settings → LLM Connections → Add LLM Connection
Fill in exactly:
LLM adapter: openai
Provider name: OpenAI
API Key: ollama
API Base URL: http://ollama:11434/v1
Custom models: Add model name as in Ollama: For example, gemma3:1b
Save the connection.
You can now select Ollama models in:
- Prompt Playground
- Evaluations
- Datasets
- LLM-as-Judge
NOTE
- Ollama must be running and reachable from Langfuse
- The model must already be pulled in Ollama
- Model names must match exactly (case-sensitive)
- This works for both CPU-only and GPU-enabled Ollama
- No external scripts or SDKs are required for UI usage
By default, Prompt Playground runs are NOT traced.
To enable tracing:
- Go to Settings → Tracing
- Enable "Trace Prompt Playground runs"
- Save
This allows:
- Viewing traces
- Running evaluations
- Adding runs to datasets
GPU support is DISABLED by default.
Requirements:
- NVIDIA GPU
- Latest NVIDIA drivers
- Docker Desktop (WSL2)
- NVIDIA Container Toolkit installed
Steps:
- Open docker-compose.yml
- Find the ollama service
- Uncomment the GPU block
- Restart Docker services:
docker compose down docker compose up -d
Run:
docker exec -it ollama ollama ps
If GPU is enabled, GPU-related information will appear. If running on CPU, this command still works but shows CPU usage.
- CPU-only mode is sufficient for learning and evaluations
- GPU is recommended for larger models (7B+)
- Models are NOT auto-downloaded
- Each user chooses which models to pull
- No prompts or data leave your machine
- Infrastructure is versioned in Git
- Runtime data and models are local
- No secrets are committed
- Users have freedom to experiment
- This is a learning & evaluation environment, not production
- Langfuse OSS: v3.148.0
- Ollama: OpenAI-compatible API
- Deployment: Local Docker
- Licensing: Fully open source