-
Notifications
You must be signed in to change notification settings - Fork 0
geo_integration
Stand: 5. Dezember 2025
Version: 1.0.0
Kategorie: Geo
This document describes the geo MVP implementation that connects blob ingestion with spatial indexing and provides CPU-based exact geometry checks using Boost.Geometry.
The geo MVP consists of four main components:
-
Geo Index Hooks (
src/api/geo_index_hooks.cpp)- Integrates spatial index updates into entity lifecycle (PUT/DELETE)
- Parses geometry from entity blobs (GeoJSON or EWKB)
- Computes sidecar metadata (MBR, centroid, z-range)
- Updates spatial index via
SpatialIndexManager
-
Boost.Geometry CPU Backend (
src/geo/boost_cpu_exact_backend.cpp)- Provides actual exact geometry intersection checks
- Uses Boost.Geometry library for computational geometry
- Supports Point, LineString, and Polygon types
- Falls back to MBR checks for unsupported types
-
Exact Geometry Check in searchIntersects (
src/index/spatial_index.cpp)- Phase 1: MBR intersection (fast candidate filter)
- Phase 2: Load entity blobs and perform exact geometry check
- Filters out MBR false positives using Boost.Geometry
- Falls back to MBR-only if exact backend not available
-
Per-PK Storage Optimization (
src/index/spatial_index.cpp)- Stores sidecar per primary key in addition to bucket JSON
- Allows updating/deleting individual entities without rewriting entire Morton bucket
- Backward compatible with existing bucket-based storage
The geo hooks are integrated into the HTTP API entity handlers:
-
PUT /entities/:key - After successful entity write, calls
GeoIndexHooks::onEntityPut() -
DELETE /entities/:key - Before entity deletion, calls
GeoIndexHooks::onEntityDelete()
Entity blobs can contain geometry in several formats:
- GeoJSON (recommended):
{
"id": "entity1",
"geometry": {
"type": "Point",
"coordinates": [10.5, 50.5]
}
}- Hex-encoded EWKB:
{
"id": "entity1",
"geometry": "0101000000000000000000244000000000008049400"
}- Binary EWKB array:
{
"id": "entity1",
"geom_blob": [1, 1, 0, 0, 0, ...]
}IMPORTANT: In the MVP implementation, spatial index updates are not atomic with entity writes.
- Entity write and spatial index update happen in separate operations
- Parse/index errors do not abort the entity write (logged only)
- Future versions should integrate into RocksDB transactions or use saga pattern
The hooks are designed to be robust:
- Geometry parse errors → logged as warnings, entity write succeeds
- Spatial index failures → logged as warnings, entity write succeeds
- Missing geometry field → silently skipped (not an error)
- Invalid JSON → logged, entity write succeeds
This ensures that geo functionality is additive and doesn't break existing functionality.
The SpatialIndexManager::searchIntersects() method now performs a two-phase query:
Phase 1: MBR Filtering (Fast)
- Uses Morton-encoded spatial index to find candidates
- Checks if entity MBR intersects query MBR
- Reduces search space by ~95% for typical queries
Phase 2: Exact Geometry Check (Accurate)
- Loads entity blob from RocksDB
- Parses geometry using EWKBParser
- Creates query geometry from bbox
- Uses Boost.Geometry to perform exact
intersects()check - Filters out false positives from MBR-only filtering
User Query (bbox)
↓
Morton Range Calculation
↓
RocksDB Range Scan (get candidates by MBR)
↓
FOR EACH candidate:
├─ MBR.intersects(query_bbox)? → NO: skip
├─ Load entity blob from RocksDB
├─ Parse geometry (GeoJSON → GeometryInfo)
├─ Boost.Geometry exactIntersects(entity_geom, query_geom)? → NO: skip
└─ YES: add to results
↓
Return filtered results (exact matches only)
- Without exact backend: Returns MBR candidates (may include false positives)
- With exact backend: Returns only true geometric intersections
- Overhead: ~1-5ms per candidate for exact check (depends on geometry complexity)
- Typical case: 10-100 candidates → 10-500ms additional latency for exact checks
- Benefit: Eliminates false positives, especially important for complex polygons
To enable the Boost.Geometry exact backend, ensure Boost is available:
# vcpkg.json already includes boost dependencies
# The backend is conditionally compiled with THEMIS_GEO_BOOST_BACKEND flagBuild with geo support:
cmake -DTHEMIS_GEO=ON -DTHEMIS_GEO_BOOST_BACKEND=ON ..If Boost.Geometry is not available:
- The build will still succeed
-
getBoostCpuBackend()returnsnullptr - Queries fall back to MBR-only filtering (no exact checks)
curl -X POST http://localhost:8080/api/spatial/index \
-H "Content-Type: application/json" \
-d '{
"table": "places",
"geometry_column": "geometry",
"config": {
"total_bounds": {"minx": -180, "miny": -90, "maxx": 180, "maxy": 90}
}
}'curl -X PUT http://localhost:8080/api/entities/places:berlin \
-H "Content-Type: application/json" \
-d '{
"key": "places:berlin",
"blob": "{\"id\":\"berlin\",\"name\":\"Berlin\",\"geometry\":{\"type\":\"Point\",\"coordinates\":[13.4,52.5]}}"
}'The spatial index is automatically updated.
curl -X POST http://localhost:8080/api/spatial/search \
-H "Content-Type: application/json" \
-d '{
"table": "places",
"bbox": {"minx": 13.0, "miny": 52.0, "maxx": 14.0, "maxy": 53.0}
}'Returns entities whose MBR intersects the query bbox. With Boost backend enabled, exact geometry checks are performed.
Run the integration tests:
cd build
ctest -R test_geo_index_integration -VTests verify:
- Entity PUT triggers spatial index insert
- searchIntersects returns correct results
- Entity DELETE removes from index
- Error handling (missing geometry, invalid JSON)
- Null spatial manager handling
-
Transactional Integration
- Integrate hooks into RocksDB WriteBatch
- Or use saga pattern for multi-step transactions
- Ensure atomicity between entity write and index update
-
Exact Geometry in Query Engine
- Wire Boost backend into
SpatialIndexManager::searchIntersects() - Load entity blobs, parse geometries, call exact checks
- Filter out MBR false positives
- Wire Boost backend into
-
Additional Backends
- SIMD-optimized CPU kernels for batch operations
- GPU compute shaders for large-scale queries
- GEOS prepared geometries plugin
-
Storage Optimization
- Migrate fully to per-PK keys
- Remove bucket JSON format (breaking change)
- Compact binary sidecar format (not JSON)
- Geometry parsing uses exception handling to prevent crashes
- No user input is directly executed (only parsed as JSON/EWKB)
- Spatial index updates are logged for audit trails
- No SQL injection risk (key-value storage only)
- MBR computation: O(n) where n = number of coordinates
- Morton encoding: O(1)
- Bucket read/write: O(k) where k = entities per bucket
- Per-PK write: O(1) additional overhead per insert/delete
- Exact checks: Depends on geometry complexity (typically fast for simple polygons)
- Geo Execution Plan:
docs/geo_execution_plan_over_blob.md - Feature Tiering:
docs/geo_feature_tiering.md - EWKB Spec: PostGIS Extended Well-Known Binary format
- Boost.Geometry: https://www.boost.org/doc/libs/release/libs/geometry/
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/