The ultimate open-source AI-first MLOps platform for 2026 β combining enterprise-grade ML lifecycle management with cutting-edge LLM, RAG, and agent capabilities. Built to surpass Vertex AI, SageMaker, Azure ML, Databricks, MLflow, W&B, and more.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β React 19 Frontend UI β β Experiments β Models β Deployments β Monitoring β Agents β ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β REST API / WebSocket ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ β FastAPI Backend (Async) β β Auth β Projects β Datasets β Features β Training β Deploy β βββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ β β β β β β β β β βΌ βΌ βΌ βΌ βΌ βΌ βΌ βΌ βΌ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βPG ββRedisββS3/ ββTritonββvLLMββCeleryββJupyterββOTelββVectorββFeatureβ βSQL ββCacheββBlobββServeββGPU ββWorkerββLab ββExportββDB ββStore β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Zenith | Vertex AI | SageMaker | Azure ML | Databricks | MLflow | W&B |
|---|---|---|---|---|---|---|---|
| Open Source | β | β | β | β | Partial | β | β |
| LLM-Native | β | β | β | β | β | Partial | β |
| Agent Orchestration | β | Partial | β | Partial | β | β | β |
| Prompt Playground | β | β | β | β | β | β | β |
| RAG Pipeline Builder | β | β | β | β | β | β | β |
| OpenTelemetry Native | β | Partial | Partial | Partial | β | β | β |
| Feature Store (Online) | β | β | β | β | β | β | β |
| Real-time Drift Detection | β | β | β | β | β | β | Partial |
| LLM-as-Judge Eval | β | β | β | β | β | β | β |
| LoRA/QLoRA Fine-tuning | β | β | β | β | β | β | β |
| Collaborative UI | β | β | β | β | β | β | β |
| Self-Hosted | β | β | β | β | Partial | β | β |
| Cost | Free | $$$ | $$$ | $$$ | $$$ | Free | $$ |
- Data Versioning: Immutable dataset snapshots with lineage tracking
- Feature Store: Online (Redis) + Offline (Parquet/Delta) with point-in-time joins
- Experiment Tracking: Parameters, metrics, artifacts, prompts, traces with real-time visualization
- Model Registry: Staging/production promotion with approval workflows and A/B testing
- Distributed Training: PyTorch FSDP/DDP, Hugging Face Accelerate, multi-GPU support
- Hyperparameter Optimization: Optuna Bayesian optimization + prompt search
- Deployment: Batch/real-time/streaming with autoscaling and canary releases
- Monitoring: Drift detection (Evidently), performance metrics, cost tracking
- Explainability: SHAP values, attention visualization, feature importance
- Governance: Bias detection, PII scanning, audit logs, RBAC
- Prompt Playground: Interactive testing with multiple models, temperature control, few-shot examples
- RAG Pipeline Builder: Visual editor for embedding, retrieval, reranking, generation
- Agent Orchestration: LangGraph/CrewAI-style workflows with tool integration
- Chain Tracing: OpenTelemetry-based distributed traces for complex LLM chains
- LLM-as-Judge: Automated evaluation using GPT-4, Claude for quality scoring
- Fine-tuning: LoRA, QLoRA with monitoring and automatic checkpoint management
- Vector Search: Integrated embedding storage and semantic search
- Hallucination Detection: Confidence scoring and fact verification
- Modern UI: React 19 + Tailwind + shadcn/ui with dark mode
- Real-time Collaboration: Live experiment updates, shared notebooks
- Jupyter Integration: Embedded JupyterLab with SDK pre-installed
- REST + Python SDK: Comprehensive APIs for all operations
- OpenTelemetry Export: Send traces to Datadog, Grafana, Jaeger
- Plugin System: Custom evaluators, metrics, retrievers, agents
- One-command Deploy: Docker Compose or Kubernetes Helm
- Docker 24+ & Docker Compose 2.20+
- 16GB RAM minimum (32GB recommended)
- NVIDIA GPU (optional, for LLM inference)
git clone https://github.com/yourusername/zenith-ml.git cd zenith-ml
cp .env.example .env
docker-compose up -d
docker-compose logs -f backend
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- JupyterLab: http://localhost:8888 (token: zenith)
- Triton Inference: http://localhost:8001
from zenith import ZenithClient
client = ZenithClient("http://localhost:8000")
project = client.create_project( name="my-first-project", description="Testing Zenith capabilities" )
experiment = client.start_experiment( project_id=project.id, name="baseline-model" )
client.log_params({"learning_rate": 0.001, "batch_size": 32}) client.log_metrics({"accuracy": 0.95, "loss": 0.12})
client.log_model(model, name="my-model", framework="pytorch")
- MLflow-compatible API with superior UI
- Real-time metric streaming with WebSocket
- Side-by-side run comparison with diff views
- Nested runs for hyperparameter sweeps
- Artifact versioning with S3/MinIO backend
- Git integration for code versioning
- Online serving with Redis (<10ms latency)
- Offline storage with Parquet/Delta Lake
- Point-in-time correct joins for time-series
- Feature transformation pipelines
- Schema evolution and validation
- Feature lineage and impact analysis
- Triton Inference Server integration
- vLLM for high-throughput LLM serving
- FastAPI endpoints with automatic OpenAPI
- A/B testing and canary deployments
- Autoscaling based on latency/throughput
- Multi-model serving with routing
- Data drift detection (Evidently AI)
- Model performance degradation alerts
- LLM-specific metrics (hallucination rate, toxicity)
- OpenTelemetry traces for debugging
- Cost tracking per model/endpoint
- Real-time dashboards with Recharts
- Visual workflow builder for agent orchestration
- Pre-built RAG templates (Q&A, summarization, etc.)
- Multi-hop reasoning with chain-of-thought
- Tool calling with automatic schema generation
- Human-in-the-loop approvals
- Workflow versioning and rollback
zenith-ml/ βββ backend/ # FastAPI application βββ frontend/ # React 19 UI βββ jupyter/ # JupyterLab configuration βββ inference/ # Triton models βββ kubernetes/ # Helm charts βββ scripts/ # Utility scripts βββ examples/ # End-to-end tutorials βββ tests/ # Test suite βββ docs/ # Documentation
Backend: FastAPI, SQLAlchemy 2, asyncpg, Redis, Celery Frontend: React 19, Vite, TypeScript, Tailwind CSS, shadcn/ui, Zustand, TanStack Query ML: PyTorch, Transformers, Accelerate, PEFT, Optuna, Evidently Inference: Triton, vLLM, llama.cpp Observability: OpenTelemetry, Prometheus, Grafana Storage: PostgreSQL, Redis, S3/MinIO Orchestration: Kubernetes, Celery, RQ
- Tabular ML: XGBoost with feature store and drift monitoring
- Computer Vision: ResNet fine-tuning with distributed training
- LLM Fine-tuning: LoRA on Llama 3 for domain adaptation
- RAG Agent: Question-answering with retrieval and reranking
- Multi-modal: CLIP for image-text matching with monitoring
We welcome contributions! See CONTRIBUTING.md for guidelines.
Apache License 2.0 - see LICENSE file
β Star us on GitHub to support the project!
- Documentation: https://zenith-ml.readthedocs.io
- Discord: https://discord.gg/zenith-ml
- Issues: https://github.com/yourusername/zenith-ml/issues
Built with β€οΈ for the ML/AI community