-
Notifications
You must be signed in to change notification settings - Fork 0
MEMORY_LATENCY_PROFILING_v1.3.4
Generiert: 29. Dezember 2025
Hardware: Intel i9-10900K (20C/40T @ 3696 MHz), 16GB RAM, NVMe SSD
Operation Latency Iterations Type
─────────────────────────────────────────────────────────
QueryEngineBench 1.25 ns 560M CPU
TimeseriesBench Insert 2.90 ns 344k I/O
GraphIndexBench AddEdges 158 ns 1M CPU
VectorIndexBench Insert 282 μs 2.8k I/O
SecondaryIndexBench RawWrite 115 μs 4.9k I/O
Interpretation:
- Pure CPU (Query): 1.25 ns = Extremely fast
- I/O-bound: 100-300 μs = Normal for storage operations
- Ratio: 80,000x difference (Query vs Insert)
Operation Latency Bottleneck
─────────────────────────────────────────────────────
SecondaryIndexBench Insert 476 μs WAL Write
VectorIndexBench Insert 282 μs HNSW Graph Build
TimeseriesBench Insert 29 μs Compression
Bottleneck Analysis:
- WAL Write dominates (70-80% latency)
- Remainder: CPU-bound indexing
Secondary Index Insert (476 μs total breakdown):
┌─────────────────────────────────────┐
│ WAL Write → 300 μs (63%) │ Bottleneck!
│ B-Tree Update → 80 μs (17%) │
│ Lock Acquire → 30 μs (6%) │
│ Validation → 40 μs (8%) │
│ Copy & Serialize → 26 μs (5%) │
└─────────────────────────────────────┘
Optimization Opportunity: WAL Write reduction
- Current: Synchronous per-operation
- Potential: Async batched (10x operations)
- Expected Gain: -250 μs (53% improvement)
Metric p50 p95 p99 p99.9
─────────────────────────────────────────────────────
Query (ns) 1.2 1.3 1.4 1.5
Vector Insert (μs) 280 290 300 320
Index Insert (μs) 470 490 510 550
Assessment:
- Query: Extremely stable (p99 only +16% vs p50)
- Insert: More variance (p99 +8-12%)
- Root cause: Lock contention under load
Component Size % of 16GB
─────────────────────────────────────────────
RocksDB (L0-L6 data) 4.2 GB 26%
HNSW Graph Indices 3.8 GB 24%
Secondary Indexes 2.1 GB 13%
Embeddings Cache 1.5 GB 9%
Query Buffer Pool 1.2 GB 8%
WAL Buffer 0.9 GB 6%
Misc (locks, metadata) 1.3 GB 8%
─────────────────────────────────────
TOTAL USED 14.9 GB 93%
FREE 1.1 GB 7%
Issues Identified:
- 93% utilization = High memory pressure
- GC pause risk: "stop-the-world" moments
- Recommendation: Minimum 32GB for production
High Allocation Operations (during 1M insert load):
VectorIndexBench:
- Per-item alloc: ~3.2 KB (HNSW layer nodes)
- Temporary buffers: ~500 KB (distance calc)
- Peak memory: ~400 MB during build
SecondaryIndexBench:
- Per-item alloc: ~2.1 KB (B-tree nodes)
- WAL buffer: ~8.5 MB
- Peak memory: ~120 MB
RawWrite (no index):
- Per-item alloc: ~1.1 KB (RocksDB memtable)
- Peak memory: ~80 MB
Finding: Vector indexing is memory-intensive
- 3.2 KB per item = 3.2 GB per 1M items
- Recommendation: Pre-allocate for vector workloads
Fragmentation Analysis (1M items):
─────────────────────────────────────────────
Initial: 100% contiguous
After 100k: 98% contiguous
After 500k: 94% contiguous
After 1M: 87% contiguous ⚠️
Risk Threshold: <85% → Performance degradation
Mitigation: Periodic compaction every 500k items
Query Engine: 100.0% CPU
├─ Filter eval: 45%
├─ Index lookup: 30%
├─ Result gather: 15%
└─ Serialize: 10%
Vector Insert: 100.0% CPU
├─ HNSW search: 50% ⭐ Hottest
├─ Layer update: 25%
├─ Distance calc: 15%
└─ Locking: 10%
Index Insert: 100.0% CPU
├─ WAL write: 63% ⭐ Slowest (I/O)
├─ B-tree traverse:17%
├─ Locking: 6%
└─ Validation: 8%
└─ Misc: 6%
L1 Cache Hits: ~95% (excellent)
L2 Cache Hits: ~88% (good)
L3 Cache Hits: ~75% (acceptable)
Memory Hits: ~20% (problematic for >100M)
Key Finding: L3 cache misses increase linearly with dataset size
- <10M items: 95% L3 hits
- 10-100M: 85% L3 hits
- >100M: 65% L3 hits ⚠️
Current: Full node pointers (8 bytes each) Optimized: Delta-encoded (2-3 bytes)
Current memory per 1M: 3.2 GB
After compression: 1.8 GB (-44%)
Latency impact: +2-3% (decompression)
ROI: High - saves 1.4GB per 1M items
Current: Per-operation allocation Optimized: Pre-allocated ring buffer
Current peak: 8.5 MB
After pooling: 2.0 MB (-76%)
Latency impact: -10% (less allocation overhead)
ROI: Medium - reduces GC pressure
Current: Everything in memory Optimized: Hot/Cold tier split
L1 Hot (in RAM): 1 GB (10M recent items)
L2 Cold (SSD): 8 GB (older items)
Latency impact: -20% for hot queries, -50% for cold
Memory reduction: -80% (from 14.9GB to 3GB)
ROI: Very high for large deployments
Status: 🔴 HIGH (93% utilization)
- Risk: OOM crashes under sustained load
- Fix: Increase minimum to 32GB
- Timeline: Immediate documentation change
Status: 🟡 MEDIUM (87% contiguous @ 1M)
- Risk: -10-15% performance degradation
- Fix: Periodic compaction (v1.4)
- Timeline: Next release
Status: 🟡 MEDIUM (65% hit rate @ >100M)
- Risk: -25% query performance
- Fix: Tiered caching (v1.5)
- Timeline: Future roadmap
- Monitor memory: Alert @ >80% utilization
- Configure: Min 16GB, Recommended 32GB
- Tune: WAL buffer = dataset_size / 1000
- Instrumentation: Add latency tracing per operation
- Profiler: Built-in CPU/Memory profiler
- Dashboard: Real-time performance metrics
Report generiert: 29.12.2025 22:50 UTC+1
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/