The High-Performance Data Engine (HPDE) is a low-latency, multi-threaded data processing system designed to interface directly with low-level hardware. It is built with a focus on achieving sub-microsecond end-to-end latency, lock-free concurrency, deterministic memory behavior, and minimal OS interference.
- C++20 compatible compiler (
clang++preferred, org++) - CMake (version 3.16 or later)
-
Clone the repository:
git clone <repository-url> cd Data\ Engine
-
Create a build directory and navigate to it:
mkdir build && cd build
-
Configure the project with CMake:
cmake -S .. -B . -DCMAKE_BUILD_TYPE=Release -
Build the project:
cmake --build . --config Release -
Run the benchmark:
./hpde_benchmark --iterations 200000 --work 128 --workers 2 --no_drop
Data Engine/
|-- src/
| |-- core.cpp
| |-- core.h
| |-- lock_free_queue.h
| |-- custom_allocator.cpp
| |-- custom_allocator.h
| |-- assembly.S
| |-- assembly.h
| |-- assembly.asm
| `-- benchmark.cpp
|-- python/
| `-- run_benchmark.py
|-- CMakeLists.txt
`-- README.md
- C++ Core Engine:
- Lock-free data structures
- Custom memory allocator with preallocated pool
- Fixed worker pool with per-worker queues
- CPU affinity for worker threads (Windows/Linux)
- x86-64 Assembly:
- Memory barriers (
lfence,sfence,mfence) - Atomic primitive (
cmpxchg) - Timestamp counter (
rdtsc)
- Memory barriers (
- Python Control Layer:
- Benchmark orchestration script
- Configuration via CLI flags
- Sub-microsecond end-to-end latency
- Lock-free concurrency
- Deterministic memory behavior
- Minimal OS interference
- Use microbenchmarks pinned to isolated cores.
- Measure p50, p99, and p99.9 latencies.
- Use hardware counters to track cache misses and false sharing.
hpde_benchmark [--iterations N] [--duration_ms N] [--work N] [--warmup N] [--workers N] [--no_drop] [--csv]
Outputs latency statistics (avg, p50, p99, p99.9, max) plus submitted/processed/dropped counts and ops/sec.
Use --csv for header + single-row CSV output.
- Disable hyper-threading (if latency sensitive).
- Disable CPU power-saving states.
- Lock memory (
mlockall). - Run with real-time scheduling if permitted.