-
Notifications
You must be signed in to change notification settings - Fork 0
README_PLUGINS
GitHub Actions edited this page Jan 2, 2026
·
1 revision
Version: 1.3.0
Release: Dezember 2025
# Automatisches Setup (lokaler Clone, nicht committen)
bash scripts/setup-llamacpp.sh
# Oder manuell (Root-Verzeichnis)
git clone https://github.com/ggerganov/llama.cpp.git llama.cpp# CPU-Only Build
cmake -B build -DTHEMIS_ENABLE_LLM=ON
cmake --build build
# Mit CUDA (NVIDIA GPU)
cmake -B build \
-DTHEMIS_ENABLE_LLM=ON \
-DTHEMIS_ENABLE_CUDA=ON
cmake --build build
# Mit Metal (Apple Silicon)
cmake -B build \
-DTHEMIS_ENABLE_LLM=ON \
-DTHEMIS_ENABLE_METAL=ON
cmake --build buildmkdir -p models
# Beispiel: Mistral 7B Instruct (Q4 quantized, ~4GB)
# Download from HuggingFace:
# https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUFcp config/llm_config.example.yaml config/llm_config.yaml
# Editiere llm_config.yaml:
# - Setze model.path auf dein heruntergeladenes Model
# - Konfiguriere GPU layers (n_layers)
# - Optional: LoRA adapters./build/themis_server --config config/llm_config.yaml| Dokument | Beschreibung |
|---|---|
| LLM_PLUGIN_DEVELOPMENT_GUIDE.md | Vollständiger Entwickler-Guide für Plugin-Entwicklung |
| LLAMA_CPP_INTEGRATION.md | llama.cpp Integration Details |
| AI_ECOSYSTEM_SHARDING_ARCHITECTURE.md | Distributed Sharding Architektur (Roadmap) |
┌─────────────────────────────────────────────────────┐
│ ThemisDB LLM Plugin System │
├─────────────────────────────────────────────────────┤
│ │
│ ILLMPlugin Interface │
│ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ LlamaCpp │ │ vLLM │ │ Custom │ │
│ │ Plugin │ │ Plugin │ │ Plugin │ │
│ └──────────────┘ └──────────────┘ └──────────┘ │
│ ↓ ↓ ↓ │
│ ┌──────────────────────────────────────────────┐ │
│ │ LLMPluginManager │ │
│ │ - Plugin Discovery & Loading │ │
│ │ - Model Management │ │
│ │ - LoRA Coordination │ │
│ └──────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
#include "llm/model_loader.h"
// Lazy loader setup
LazyModelLoader::Config config;
config.max_models = 3; // Keep up to 3 models in memory
config.max_vram_mb = 24576; // 24 GB budget
config.model_ttl = std::chrono::seconds(1800); // 30 min TTL
LazyModelLoader loader(config);
// First request: Loads model lazily (~2-3 seconds)
auto* model = loader.getOrLoadModel(
"mistral-7b",
"/models/mistral-7b-instruct-q4.gguf"
);
// Subsequent requests: Instant (cache hit!)
auto* same_model = loader.getOrLoadModel("mistral-7b", "");
// ~0ms!
// Pin important models to prevent eviction
loader.pinModel("mistral-7b");#include "llm/multi_lora_manager.h"
// Multi-LoRA setup
MultiLoRAManager::Config config;
config.max_lora_slots = 16; // Up to 16 LoRAs
config.max_lora_vram_mb = 2048; // 2 GB for LoRAs
config.enable_multi_lora_batch = true;
MultiLoRAManager lora_mgr(config);
// Load multiple LoRAs for same base model
lora_mgr.loadLoRA("legal-qa", "/loras/legal-qa-v1.bin", "mistral-7b");
lora_mgr.loadLoRA("medical-diag", "/loras/medical-v1.bin", "mistral-7b");
lora_mgr.loadLoRA("code-assist", "/loras/code-v1.bin", "mistral-7b");
// Use different LoRAs per request
InferenceRequest req1;
req1.prompt = "Legal question";
req1.lora_adapter_id = "legal-qa";
InferenceRequest req2;
req2.prompt = "Medical question";
req2.lora_adapter_id = "medical-diag";
// Fast LoRA switching (~5ms)#include "llm/llm_plugin_manager.h"
#include "llm/llamacpp_plugin.h"
// Plugin erstellen und konfigurieren
json config = {
{"model_path", "/models/mistral-7b-instruct-q4.gguf"},
{"n_gpu_layers", 32},
{"n_ctx", 4096}
};
createLlamaCppPlugin("llamacpp", config["model_path"], config);auto& manager = LLMPluginManager::instance();
InferenceRequest request;
request.prompt = "Was ist ThemisDB?";
request.max_tokens = 512;
request.temperature = 0.7f;
auto response = manager.generate(request);
std::cout << "Response: " << response.text << std::endl;
std::cout << "Tokens: " << response.tokens_generated << std::endl;
std::cout << "Time: " << response.inference_time_ms << "ms" << std::endl;// Dokumente aus ThemisDB abrufen (Vector Search)
RAGContext context;
context.query = "Rechtliche Aspekte der Datenspeicherung";
context.documents = {
{.content = "Dokument 1 Inhalt...", .source = "doc1.pdf", .relevance_score = 0.95},
{.content = "Dokument 2 Inhalt...", .source = "doc2.pdf", .relevance_score = 0.87}
};
InferenceRequest request;
request.prompt = context.query;
auto response = manager.generateRAG(context, request);auto* plugin = manager.getPlugin("llamacpp");
// LoRA laden
plugin->loadLoRA(
"legal-qa-v1",
"/loras/legal-qa-v1.bin",
1.0f // scale
);
// Mit LoRA inferieren
InferenceRequest request;
request.prompt = "Rechtliche Frage...";
request.lora_adapter_id = "legal-qa-v1";
auto response = plugin->generate(request);- ✅ Plugin-basierte Architektur
- ✅ llama.cpp Integration (Reference Implementation)
- ✅ Model Loading (GGUF Format)
- ✅ LoRA Adapter Support
- ✅ GPU Acceleration (CUDA, Metal, Vulkan, HIP)
- ✅ RAG Integration
- ✅ Memory Management & Statistics
- ✅ Multi-Plugin Support
- ✅ Ollama-style Lazy Loading (v1.3.0)
- ✅ vLLM-style Multi-LoRA (v1.3.0)
- 🚧 HTTP API Endpoints
- 🚧 Streaming Generation
- 🚧 Batch Inference
- 🚧 Distributed Sharding (etcd + gRPC)
- 🚧 Cross-Shard LoRA Transfer
- 🚧 Federated RAG Queries
- 🚧 vLLM Plugin Implementation
- 🚧 Model Replication (Raft Consensus)
| Option | Beschreibung | Default |
|---|---|---|
THEMIS_ENABLE_LLM |
LLM Plugin Support aktivieren | OFF |
THEMIS_ENABLE_CUDA |
CUDA GPU Support | OFF |
THEMIS_ENABLE_METAL |
Metal GPU Support (macOS) | OFF |
THEMIS_ENABLE_VULKAN |
Vulkan GPU Support | OFF |
THEMIS_ENABLE_HIP |
AMD HIP/ROCm Support | OFF |
| Operation | Latenz | Throughput |
|---|---|---|
| Model Loading | ~2-3 Sekunden | - |
| Text Generation (512 tokens) | ~300ms | ~1700 tokens/s |
| RAG Query (10 docs) | ~320ms | - |
| LoRA Loading | ~50ms | - |
| LoRA Switch | ~5ms | - |
| Embedding (512 tokens) | ~5ms | - |
| Model | Quantization | VRAM |
|---|---|---|
| Phi-3-Mini | Q4_K_M | ~2 GB |
| Mistral-7B | Q4_K_M | ~4 GB |
| Llama-2-13B | Q4_K_M | ~8 GB |
| Llama-3-70B | Q4_K_M | ~40 GB |
# Prüfe Model Format (muss GGUF sein)
file models/mistral-7b.gguf
# Prüfe Permissions
ls -la models/
# Teste mit kleinerem Model
# Phi-3-Mini: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf# Erhöhe GPU layers in config
n_layers: 32 # → 35 oder mehr
# Prüfe GPU Auslastung
nvidia-smi # (CUDA)
# Oder
sudo powermetrics --samplers gpu_power # (Metal/macOS)
# Reduziere Context Size wenn möglich
n_ctx: 4096 # → 2048# llama.cpp lokaler Clone fehlt?
ls -la ./llama.cpp
# Falls nicht vorhanden: lokalen Clone erstellen (nicht committen)
git clone https://github.com/ggerganov/llama.cpp.git llama.cpp
# CUDA nicht gefunden?
export CUDA_PATH=/usr/local/cuda
cmake -B build -DTHEMIS_ENABLE_LLM=ON -DTHEMIS_ENABLE_CUDA=ON# Empfohlen: Skript für MSVC Release-Build mit LLM
powershell -File scripts/build-themis-server-llm.ps1
# Sanity-Check
./build-msvc/bin/themis_server.exe --helpHinweise:
- Das Skript setzt Visual Studio 2022 (
-G "Visual Studio 17 2022") und x64 Architektur (-A x64). - vcpkg-Toolchain wird eingebunden;
llama.cpp/ist lokaler Clone und per.gitignore/.dockerignoreausgeschlossen.
- llama.cpp: https://github.com/ggerganov/llama.cpp
- GGUF Models: https://huggingface.co/models?library=gguf
- LoRA Fine-tuning: https://github.com/tloen/alpaca-lora
- ThemisDB: https://github.com/makr-code/ThemisDB
ThemisDB: MIT License
llama.cpp: MIT License
Version: 1.3.0
Last Updated: Dezember 2025
Status: Production Ready (Reference Implementation)
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/