-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
Description
Spike: Evaluate Potential Gains from Adopting mimalloc
Background
We are considering mimalloc as an alternative memory allocator to potentially improve performance and stability characteristics of our C++ service. This issue is a spike: the goal is to measure and document the impact (if any) rather than commit to a production rollout.
Goals
- Quantify whether mimalloc provides measurable improvements in:
- Tail latency (p99 / p999 / p9999) under representative workloads
- Throughput (QPS / ops/sec)
- CPU efficiency (cycles/op, context switches)
- Memory behavior (RSS, fragmentation, allocator-related stalls)
- Operational stability (allocator-related crashes, OOM behavior, regressions)
- Identify compatibility risks and integration cost (build, runtime flags, tooling).
Non-Goals
- No production rollout in this spike.
- No allocator “winner” decision unless data is clearly conclusive.
- No deep refactor of memory ownership or data structures.
Scope / Experiment Plan
1) Integration Options to Test
- Dynamic preload:
LD_PRELOAD=...where applicable (quick validation). - Link-time replacement: link mimalloc into the binary and enable override.
- (Optional) Evaluate “baseline allocator” variants already in use (e.g., glibc malloc, tcmalloc/jemalloc if relevant) for apples-to-apples comparison.
2) Workloads
- Microbenchmarks (allocator-heavy patterns relevant to our code):
- Small object allocation/free churn
- Mixed-size allocations
- Multi-thread contention scenarios
- Service-level benchmark:
- Representative traffic mix
- Concurrency levels reflecting production (low/medium/high load)
- Focus on tail behavior during bursts
3) Metrics to Collect
- Latency: p50/p95/p99/p999/p9999 (and max)
- Throughput: QPS / ops/sec
- CPU: user/sys CPU, cycles/op (if available), context switches
- Memory: RSS, page faults, fragmentation indicators, heap growth patterns
- Allocator stats:
- mimalloc internal stats (if enabled)
- malloc-related stalls/locks (if observable)
- Stability signals:
- Crash rate, OOM events, latency spikes correlated with allocation
4) Test Controls / Methodology
- Same hardware / kernel / compiler settings across runs
- Fixed test duration and warm-up period
- Repeat runs (e.g., N≥5) and report mean + variance
- Keep logs of allocator configuration flags used in each run
Deliverables
- A short report containing:
- Benchmark setup and allocator configuration
- Results tables/plots for the key metrics
- Observed risks or regressions
- Recommendation: adopt / do not adopt / needs further investigation
Acceptance Criteria
- We have a reproducible benchmark harness and documented configs.
- We can answer, with data:
- Does mimalloc improve p999/p9999 latency meaningfully?
- Does it impact QPS or CPU cost?
- Any memory regressions (RSS growth, fragmentation) or stability concerns?
- Clear next-step recommendation based on results.
Risks / Considerations
- Compatibility with sanitizers / profiling tooling
- Behavior under memory pressure and fragmentation-heavy workloads
- Differences in allocation patterns between benchmark and production
- Build and deployment complexity (static vs dynamic, override behavior)
Tasks
- Integrate mimalloc using preload and link-time options
- Add allocator selection/config to benchmark runner
- Run microbenchmarks and capture allocator stats
- Run service-level benchmark across load profiles
- Analyze results and produce summary report
- Document recommendation and follow-up actions (if any)
References
- mimalloc repository/docs (add links as needed)
- Internal benchmark runbook / dashboards (add links as needed)
Reactions are currently unavailable