Skip to content

[Feature]: Spike: Evaluate Potential Gains from Adopting mimalloc #291

@MalikHou

Description

@MalikHou

Spike: Evaluate Potential Gains from Adopting mimalloc

Background

We are considering mimalloc as an alternative memory allocator to potentially improve performance and stability characteristics of our C++ service. This issue is a spike: the goal is to measure and document the impact (if any) rather than commit to a production rollout.

Goals

  • Quantify whether mimalloc provides measurable improvements in:
    • Tail latency (p99 / p999 / p9999) under representative workloads
    • Throughput (QPS / ops/sec)
    • CPU efficiency (cycles/op, context switches)
    • Memory behavior (RSS, fragmentation, allocator-related stalls)
    • Operational stability (allocator-related crashes, OOM behavior, regressions)
  • Identify compatibility risks and integration cost (build, runtime flags, tooling).

Non-Goals

  • No production rollout in this spike.
  • No allocator “winner” decision unless data is clearly conclusive.
  • No deep refactor of memory ownership or data structures.

Scope / Experiment Plan

1) Integration Options to Test

  • Dynamic preload: LD_PRELOAD=... where applicable (quick validation).
  • Link-time replacement: link mimalloc into the binary and enable override.
  • (Optional) Evaluate “baseline allocator” variants already in use (e.g., glibc malloc, tcmalloc/jemalloc if relevant) for apples-to-apples comparison.

2) Workloads

  • Microbenchmarks (allocator-heavy patterns relevant to our code):
    • Small object allocation/free churn
    • Mixed-size allocations
    • Multi-thread contention scenarios
  • Service-level benchmark:
    • Representative traffic mix
    • Concurrency levels reflecting production (low/medium/high load)
    • Focus on tail behavior during bursts

3) Metrics to Collect

  • Latency: p50/p95/p99/p999/p9999 (and max)
  • Throughput: QPS / ops/sec
  • CPU: user/sys CPU, cycles/op (if available), context switches
  • Memory: RSS, page faults, fragmentation indicators, heap growth patterns
  • Allocator stats:
    • mimalloc internal stats (if enabled)
    • malloc-related stalls/locks (if observable)
  • Stability signals:
    • Crash rate, OOM events, latency spikes correlated with allocation

4) Test Controls / Methodology

  • Same hardware / kernel / compiler settings across runs
  • Fixed test duration and warm-up period
  • Repeat runs (e.g., N≥5) and report mean + variance
  • Keep logs of allocator configuration flags used in each run

Deliverables

  • A short report containing:
    • Benchmark setup and allocator configuration
    • Results tables/plots for the key metrics
    • Observed risks or regressions
    • Recommendation: adopt / do not adopt / needs further investigation

Acceptance Criteria

  • We have a reproducible benchmark harness and documented configs.
  • We can answer, with data:
    • Does mimalloc improve p999/p9999 latency meaningfully?
    • Does it impact QPS or CPU cost?
    • Any memory regressions (RSS growth, fragmentation) or stability concerns?
  • Clear next-step recommendation based on results.

Risks / Considerations

  • Compatibility with sanitizers / profiling tooling
  • Behavior under memory pressure and fragmentation-heavy workloads
  • Differences in allocation patterns between benchmark and production
  • Build and deployment complexity (static vs dynamic, override behavior)

Tasks

  • Integrate mimalloc using preload and link-time options
  • Add allocator selection/config to benchmark runner
  • Run microbenchmarks and capture allocator stats
  • Run service-level benchmark across load profiles
  • Analyze results and produce summary report
  • Document recommendation and follow-up actions (if any)

References

  • mimalloc repository/docs (add links as needed)
  • Internal benchmark runbook / dashboards (add links as needed)

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions