-
Notifications
You must be signed in to change notification settings - Fork 0
geo_architecture
Stand: 5. Dezember 2025
Version: 1.0.0
Kategorie: Geo
Erstellt: 17. November 2025
Status: Phase 0.1 implementiert (EWKB Storage + Sidecar)
Design: Geo als Cross-Cutting Infrastructure für alle 5 Datenbank-Modelle
Geo ist KEIN separates Datenbank-Modell, sondern eine optionale Cross-Cutting Capability für alle 5 Modelle:
┌─────────────────────────────────────────────────────────────┐
│ ThemisDB Multi-Model │
├─────────────────────────────────────────────────────────────┤
│ Relational │ Graph │ Vector │ Content │ Time-Series │
├─────────────────────────────────────────────────────────────┤
│ │
│ GEO CROSS-CUTTING CAPABILITY (Optional) │
│ - EWKB Storage (geometry_ field in BaseEntity) │
│ - R-Tree Spatial Index (table-agnostic) │
│ - ST_* Functions (universal AQL extensions) │
│ - Query Optimizer (spatial selectivity) │
│ │
├─────────────────────────────────────────────────────────────┤
│ RocksDB Blob Storage Layer │
└─────────────────────────────────────────────────────────────┘
Vorteile:
- ✅ Jedes Modell kann geo-enabled sein (optional
geometry_field) - ✅ Gemeinsamer R-Tree Index für alle Tabellen
- ✅ Universelle ST_* Functions in AQL
- ✅ Symbiotische Beziehungen zwischen Modellen
Standard: PostGIS-kompatibles EWKB/EWKBZ Format
// Geometry Types
enum class GeometryType : uint32_t {
Point = 1,
LineString = 2,
Polygon = 3,
// 3D variants (Z flag: type | 0x80000000)
PointZ = 0x80000001,
LineStringZ = 0x80000002,
PolygonZ = 0x80000003,
// MultiPoint, MultiLineString, etc.
};EWKB Binary Layout:
┌─────────────┬─────────────┬──────────┬────────────────┐
│ Byte Order │ Type (4B) │ SRID(4B) │ Coordinates... │
│ (1 byte) │ +Z flag │ optional │ (8 bytes/coord)│
└─────────────┴─────────────┴──────────┴────────────────┘
Beispiel: Point(13.4, 52.5) - Berlin
0x01 // Little Endian
0x01 0x00 0x00 0x00 // Type: Point
0xCD 0xCC 0xCC 0xCC 0xCC 0xCC 0x2A 0x40 // X: 13.4
0x00 0x00 0x00 0x00 0x00 0x40 0x4A 0x40 // Y: 52.5
Optional Geometry Field:
class BaseEntity {
// Existing fields
std::string primary_key_;
Blob blob_; // Relational/Graph/Vector/Content data
// GEO: Optional geometry field (cross-cutting)
std::optional<Blob> geometry_; // EWKB blob
std::optional<geo::GeoSidecar> geo_sidecar_; // Fast filtering metadata
};Design Rationale:
- ✅ Optional: Entities without geometry have zero overhead
- ✅ Separate: Geometry independent from main blob (different access patterns)
- ✅ Automatic: Sidecar computed on
setGeometry()
Metadata for Spatial Index:
struct GeoSidecar {
MBR mbr; // 2D bounding box (minx, miny, maxx, maxy)
Coordinate centroid; // Geometric center
double z_min = 0.0; // Min elevation (for 3D)
double z_max = 0.0; // Max elevation (for 3D)
};Purpose:
- Broadphase: R-Tree uses MBR for fast spatial queries
- 3D Filtering: Z-Range index for elevation queries
- Centroid: Distance calculations, visual representation
Storage:
RocksDB Column Families:
data:<table>:<pk> → BaseEntity blob
geo:<table>:<pk> → EWKB + GeoSidecar (separate CF for cache locality)
Design: Single R-Tree implementation works for ALL 5 models
class SpatialIndexManager {
public:
// Create spatial index for ANY table
Status createSpatialIndex(
std::string_view table, // "cities", "locations", "images", etc.
std::string_view geometry_column = "geometry",
const RTreeConfig& config = {}
);
// Query (returns PKs, model-agnostic)
std::vector<std::string> searchIntersects(
std::string_view table,
const MBR& query_bbox
);
};RocksDB Key Schema:
# Table-agnostic design:
spatial:<table>:<morton_code> → list<PK>
# Examples for different models:
spatial:cities:12345678 → ["cities/berlin", "cities/munich"] # Relational
spatial:locations:23456789 → ["locations/loc1", "locations/loc2"] # Graph nodes
spatial:images:34567890 → ["images/img1", "images/img2"] # Vector entities
spatial:documents:45678901 → ["content/doc1", "content/doc2"] # Content
spatial:sensors:56789012 → ["sensors/temp1", "sensors/temp2"] # Time-Series
Morton Codes (Z-Order):
- Interleave X/Y coordinates → single 64-bit integer
- Preserves spatial locality in 1D key space
- Efficient range queries in RocksDB
For 3D Geometries:
# Composite index on z_min/z_max:
zrange:<table>:<z_bucket> → list<PK>
# Example: Find entities between elevation 50-150m
zrange:locations:050 → ["loc1", "loc2", ...]
zrange:locations:100 → ["loc3", "loc4", ...]
zrange:locations:150 → ["loc5", ...]
17 Core Functions (alle Modelle):
Constructors:
ST_Point(lon DOUBLE, lat DOUBLE, z DOUBLE = NULL) -> GEOMETRY
ST_GeomFromGeoJSON(json STRING) -> GEOMETRY
ST_GeomFromText(wkt STRING) -> GEOMETRYConverters:
ST_AsGeoJSON(geom GEOMETRY) -> STRING
ST_AsText(geom GEOMETRY) -> STRING
ST_Envelope(geom GEOMETRY) -> GEOMETRY -- Returns MBR as polygonPredicates (2D + 3D):
ST_Intersects(geom1 GEOMETRY, geom2 GEOMETRY) -> BOOL
ST_Within(geom1 GEOMETRY, geom2 GEOMETRY) -> BOOL
ST_Contains(geom1 GEOMETRY, geom2 GEOMETRY) -> BOOLDistance (Haversine for geodetic):
ST_Distance(geom1 GEOMETRY, geom2 GEOMETRY) -> DOUBLE
ST_DWithin(geom1 GEOMETRY, geom2 GEOMETRY, distance DOUBLE) -> BOOL
ST_3DDistance(geom1 GEOMETRY, geom2 GEOMETRY) -> DOUBLE3D Helpers:
ST_HasZ(geom GEOMETRY) -> BOOL
ST_Z(geom GEOMETRY) -> DOUBLE
ST_ZMin(geom GEOMETRY) -> DOUBLE
ST_ZMax(geom GEOMETRY) -> DOUBLE
ST_Force2D(geom GEOMETRY) -> GEOMETRY
ST_Force3D(geom GEOMETRY, z DOUBLE = 0.0) -> GEOMETRY
ST_ZBetween(geom GEOMETRY, z_min DOUBLE, z_max DOUBLE) -> BOOLModel-Agnostic Execution:
// Step 1: Parse spatial predicate
ST_Intersects(entity.geometry, @viewport)
// Step 2: Extract MBR from @viewport
MBR query_bbox = extractMBR(@viewport);
// Step 3: Broadphase - R-Tree scan
std::vector<std::string> candidates = spatial_index.searchIntersects(table, query_bbox);
// Step 4: Z-Filter (optional for 3D)
if (query.has_z_constraint) {
candidates = z_index.filterByRange(candidates, z_min, z_max);
}
// Step 5: Load entities
std::vector<BaseEntity> entities = loadEntities(candidates);
// Step 6: Exact check (Boost.Geometry)
for (auto& entity : entities) {
if (exact_intersects(entity.geometry, viewport)) {
results.push_back(entity);
}
}
// Step 7: Apply additional filters (population, type, etc.)Cost-Based Decision:
struct SpatialSelectivity {
double area_ratio; // query_bbox_area / total_area
double density; // avg entities per unit area
int estimated_hits; // from R-Tree stats
};
// Decision logic (universal for all models):
if (spatial_selectivity < 0.01) {
plan = SPATIAL_FIRST; // Geo filter -> other filters
} else {
plan = FILTER_FIRST; // Other filters -> geo filter
}Example:
-- Query with high spatial selectivity (small area)
FOR city IN cities
FILTER city.population > 100000 -- Low selectivity (many cities)
AND ST_Intersects(city.boundary, @viewport) -- High selectivity (small area)
RETURN city
-- Optimizer chooses: SPATIAL_FIRST
-- 1. R-Tree scan (viewport) -> 10 candidates
-- 2. Filter by population -> 7 results| Modell | Geo-Enabled Use Cases | Geo profitiert von | Geo enhances |
|---|---|---|---|
| Relational | Spatial joins, WHERE + ST_Intersects | Secondary Indexes (country, type) | Attribute + Location queries |
| Graph | Spatial graph traversal (road networks) | Edge connectivity, routing | Geo-constrained path finding |
| Vector | Spatial-filtered ANN (location + similarity) | HNSW whitelist/mask | Pre-filtered vector search |
| Content | Geo-tagged documents/chunks | Fulltext search | Location-based RAG |
| Time-Series | Geo-temporal queries (trajectories) | Timestamp evolution | Sensor location tracking |
Use Case: Spatial Join
FOR city IN cities
FILTER city.population > 100000
AND city.country == 'Germany'
AND ST_Intersects(city.boundary, @viewport)
RETURN cityBenefits:
- Combine attribute filters (population, country) with spatial filters
- Secondary indexes for attributes + R-Tree for geometry
- Hybrid query optimization
Use Case: Spatial Graph Traversal
FOR v IN 1..5 OUTBOUND 'locations/berlin' GRAPH 'roads'
FILTER ST_DWithin(v.location, @center, 5000) -- Within 5km
RETURN vBenefits:
- Geo-constrained BFS (prune frontier by location)
- Road network routing with spatial bounds
- Graph connectivity + spatial proximity
Use Case: Spatial-Filtered ANN
FOR img IN images
FILTER ST_Within(img.location, @region)
SORT SIMILARITY(img.embedding, @query) DESC
LIMIT 10
RETURN imgBenefits:
- Pre-filter HNSW with spatial whitelist (Roaring bitmap)
- Location + Visual Similarity hybrid search
- Efficient for location-based image search
Use Case: Location-Based RAG
FOR doc IN documents
FILTER FULLTEXT(doc.text, "hotel")
AND ST_DWithin(doc.location, @myLocation, 2000) -- 2km radius
RETURN docBenefits:
- Geo-tagged documents/chunks
- Fulltext + Location hybrid ranking
- "Find hotels near me" queries
Use Case: Geo-Temporal Queries
FOR reading IN sensor_data
FILTER reading.timestamp > @start
AND ST_Contains(@area, reading.sensor_location)
RETURN readingBenefits:
- Sensor trajectories, IoT device tracking
- Timestamp + Location evolution
- Geo-fencing for time-series data
Files:
-
include/utils/geo/ewkb.h(200 lines) -
src/utils/geo/ewkb.cpp(400 lines) -
include/storage/base_entity.h(extended with geometry fields) -
src/storage/base_entity.cpp(geo methods) -
tests/geo/test_geo_ewkb.cpp(18 unit tests)
Features:
- ✅ EWKB/EWKBZ Parser/Serializer
- ✅ GeometryType: Point, LineString, Polygon (2D/3D)
- ✅ MBR computation (2D + Z-Range)
- ✅ Centroid computation
- ✅ GeoSidecar auto-computation
- ✅ GeoJSON/WKT parsing (partial)
- ✅ BaseEntity integration (optional
geometry_,geo_sidecar_)
Tests:
- ✅ Point 2D/3D parsing
- ✅ LineString parsing
- ✅ Polygon parsing
- ✅ MBR computation (2D + 3D)
- ✅ Centroid computation
- ✅ Sidecar computation
- ✅ MBR intersection/contains
- ✅ GeoJSON parsing
- ✅ Round-trip (serialize → parse)
- ✅ EWKB validation
Files:
-
include/index/spatial_index.h(200+ lines) -
src/index/spatial_index.cpp(600+ lines) -
tests/geo/test_spatial_index.cpp(350+ lines, 13 tests)
Features:
- ✅ Table-Agnostic Design: Funktioniert für alle 5 Modelle (Relational, Graph, Vector, Content, Time-Series)
- ✅ Morton Code Z-Order: Spatial locality preservation in 1D RocksDB key space
- ✅ SpatialIndexManager API:
-
createSpatialIndex(table, column, config)- Für beliebige Tabelle -
searchIntersects(table, bbox)- MBR intersection queries -
searchWithin(table, bbox, z_min, z_max)- Containment + 3D filtering -
searchContains(table, x, y, z)- Point-in-polygon broadphase -
searchNearby(table, x, y, max_distance)- Distance-based (Haversine) -
searchKNN(table, x, y, k)- K-nearest neighbors (spatial only)
-
- ✅ RocksDB Key Schema:
spatial:<table>:<morton_code> → list<PK + Sidecar> - ✅ Multi-Table Support: Separate indexes for cities, locations, images, documents
- ✅ CRUD Operations: Insert, Update, Remove with Morton bucket management
- ✅ Index Statistics: Entry count, Morton buckets, average MBR area
Tests:
- ✅ Create/Drop spatial index
- ✅ Insert and search single point
- ✅ Multiple points in same region
- ✅ Search within (strict containment)
- ✅ Search contains point
- ✅ Search nearby (distance-based, sorted by distance)
- ✅ Multi-table support (cities, locations, images, documents isolated)
- ✅ Update location (remove from old bucket, insert to new)
- ✅ Remove entity
- ✅ Index statistics
- ✅ Morton encoder (encode/decode, spatial locality verification)
Performance:
- Insert: O(log N) - RocksDB write + JSON list append
- Query: O(log N + K) - RocksDB range scan + MBR filtering
- Memory: ~150 bytes per entry (PK + Sidecar JSON)
- Spatial locality: Morton codes cluster nearby points in same buckets
Next Steps:
- Extend AQL parser with GEOMETRY type
- 17 ST_* functions parsing
- Function registry for spatial functions
- Type checking (GEOMETRY vs. other types)
- Tests: Parser + AST validation
Next Steps:
- Extend AQL parser with ST_* functions
- 17 core functions implementation
- Function registry
- Type checking (GEOMETRY type)
- Tests: Parser + execution
Next Steps:
- Spatial execution plan
- Query optimizer (spatial selectivity)
- Cost estimation
- Metrics (spatial.index_hits, etc.)
- Tests: End-to-end spatial queries
Alternatives considered:
- ❌ GeoJSON text: Human-readable but inefficient storage (3-5x larger)
- ❌ Custom binary: Not interoperable with other systems
- ✅ EWKB: Industry standard, compact, PostGIS-compatible
Alternatives considered:
- ❌ Separate Geo Model: Would duplicate functionality, increase complexity
- ❌ Only Relational: Would exclude Graph/Vector/Content use cases
- ✅ Cross-Cutting: Enables all 5 models, shared infrastructure, symbiotic benefits
Alternatives considered:
- ❌ Always present: Wastes space for non-geo entities (majority)
- ❌ Separate table: Breaks entity model, complicates queries
- ✅ Optional field: Zero overhead when unused, simple API
Alternatives considered:
- ❌ GEOS: GPL license (incompatible with commercial use)
- ❌ Custom implementation: High complexity, error-prone
- ✅ Boost.Geometry: Permissive license (BSL-1.0), header-only, already in project
Without Geometry:
- BaseEntity: ~200 bytes (typical)
- Geo overhead: 0 bytes ✅
With Geometry (Point):
- EWKB: ~21 bytes (1 byte order + 4 type + 16 coords)
- GeoSidecar: ~72 bytes (MBR + centroid + z-range)
- Total overhead: ~93 bytes per entity
With Geometry (Polygon, 100 points):
- EWKB: ~1621 bytes
- GeoSidecar: ~72 bytes (fixed size)
- Total overhead: ~1693 bytes
R-Tree Spatial Index:
- Insert: O(log N)
- Query: O(log N + K) where K = result size
- Memory: ~100 bytes per entry (R-Tree node)
Morton Code Z-Order:
- Encoding: O(1) (bitwise operations)
- Range query: O(log N) RocksDB scan
- Better cache locality than naive 2D indexing
Spatial Query Pipeline:
1. Parse : ~10 μs (AQL → AST)
2. R-Tree scan : ~100 μs (broadphase, 1M entities)
3. Z-Filter : ~50 μs (optional, 3D)
4. Load entities : ~500 μs (RocksDB, 100 candidates)
5. Exact check : ~200 μs (Boost.Geometry, 100 checks)
6. Total : ~860 μs (for typical query)
Optimization Opportunities:
- ⚡ SIMD (Phase 5.1): 3-5x speedup on exact checks
- ⚡ Morton + Roaring (Phase 5.1): Better set algebra
- ⚡ GPU Backend (Phase 5.3): 10-100x for batch queries
Google Highway (Apache-2.0):
-
pointInPolygon_simd(): 3-5x faster -
bboxOverlap_simd(): 10x faster (batch processing) -
haversineDistance_simd(): 4x faster
Shapefile → Relational Table:
POST /api/import/shapefile
{
"file": "cities.shp",
"table": "cities",
"attributes": ["name", "population", "country"]
}GeoTIFF → Tiles:
POST /api/import/geotiff
{
"file": "elevation.tif",
"table": "elevation_tiles",
"tile_size": 256
}Compute Shaders (DX12/Vulkan):
- Batch ST_Intersects (10k+ geometries): 100x faster
- SoA layout, prefix sum, stream compaction
Geo als Cross-Cutting Capability:
- ✅ Universal: Funktioniert für alle 5 Datenbank-Modelle
- ✅ Optional: Zero overhead wenn nicht verwendet
- ✅ Standard: EWKB/EWKBZ PostGIS-kompatibel
- ✅ Efficient: R-Tree + Morton codes für schnelle Queries
- ✅ Symbiotic: Jedes Modell profitiert von Geo, Geo profitiert von jedem Modell
Status: Phase 0.1 abgeschlossen (EWKB Storage + Sidecar)
Next: Phase 0.2 (R-Tree Spatial Index)
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/