Advanced Cache

AdvCache is a high-performance, production-ready in-memory HTTP cache and reverse proxy designed for latency-sensitive workloads. Built with Rust using tokio and axum, it delivers exceptional throughput (165-300k RPS) with minimal memory overhead and zero-allocation hot paths.

🚀 Key Features

Performance & Scalability

High Throughput: 165k RPS MacBook M2 Pro MAX, up to 300k RPS on 24-core bare-metal servers (see performance screenshots)
Memory Efficient: up to 64 bytes overhead per cache key
Sharded Storage: 1024 shards with per-shard LRU for optimal lock contention reduction

Advanced Caching

Realtime Cache Invalidation: Implements through API endpoint for direct usage and by the background worker.
TinyLFU Admission Control: Intelligent cache admission using Count-Min Sketch and Doorkeeper
Background Refresh: Automatic TTL-based cache refresh without blocking requests
Flexible Cache Keys: Configurable query parameters and headers for precise cache control
Selective Header Forwarding: Fine-grained control over cached and forwarded headers

Production-Ready Features

Runtime Control Plane: Dynamic toggles for admission, eviction, refresh, compression, and tracing
Observability: Prometheus/VictoriaMetrics metrics and OpenTelemetry tracing with minimal overhead
Kubernetes Integration: Health probes, ConfigMap support, and Docker images
Graceful Shutdown: Safe resource cleanup and connection draining

Developer Experience

Comprehensive API: RESTful endpoints for cache management and monitoring
Rich Configuration: YAML-based configuration with inline documentation
Extensive Testing: Unit tests, integration tests, and end-to-end test coverage
OpenAPI Documentation: Complete API specification via Swagger/OpenAPI

🏃 Quick Start

Click to expand Quick Start

Prerequisites

Rust 1.82+ (for building from source)
Docker (for containerized deployment)
YAML configuration file

Installation

Using Docker

# Build the Docker image
docker build -t advcache .

# Run with custom configuration
docker run --rm -p 8020:8020 \
  -v "$PWD/cfg:/app/cfg" \
  -v "$PWD/public/dump:/app/public/dump" \
  advcache -cfg /app/cfg/advcache.cfg.yaml

Building from Source

# Clone the repository
git clone https://github.com/Borislavv/adv-cache.git
cd adv-cache

# Build in release mode
cargo build --release

# Run with configuration
./target/release/advcache -cfg ./cfg/advcache.cfg.yaml

Configuration

Create a cfg/advcache.cfg.yaml file with the following setup:

Click to expand configuration example

cache:
  env: "prod"                    # Runtime environment label (e.g., dev/stage/prod). Used for logs/metrics tagging.
  enabled: true                  # Master switch: enables the cache service.

  logs:
    level: "info"                # Log level: debug|info|warn|error. Prefer "info" in prod, "debug" for short bursts.

  runtime:
    num_cpus: 0                  # 0 = auto max available cores. Set explicit N to cap CPU usage.

  api:
    name: "adv_cache"            # Human-readable service name exposed in API/metrics.
    port: "8020"                 # HTTP port for the admin/API endpoints.

  upstream:
    backend:
      id: "mock_upstream"
      enabled: true
      policy: "await"
      host: "localhost:8021"
      scheme: "http"
      rate: 1500                  # Per-backend RPS cap (token bucket or equivalent).
      concurrency: 4096           # Max simultaneous requests.
      timeout: "10s"              # Base timeout for upstream requests.
      max_timeout: "1m"           # Hard cap if “slow path” header allows extending timeouts.
      use_max_timeout_header: ""  # If non-empty, presence of this header lifts timeout to max_timeout.
      healthcheck: "/healthz"     # Liveness probe path; 2xx = healthy.

  # Compression
  # - Supported levels:
  #   CompressNoCompression      = 0
  #   CompressBestSpeed          = 1
  #   CompressBestCompression    = 9
  #   CompressDefaultCompression = 6
  #   CompressHuffmanOnly        = -2
  compression:
    enabled: false
    level: 1

  data:
    dump:
      enabled: false              # Enable periodic dump to disk for warm restarts / backup.
      dump_dir: "public/dump"     # Directory to store dump files.
      dump_name: "cache.dump"     # Base filename (rotations will append indices/timestamps).
      crc32_control_sum: true     # Validate dump integrity via CRC32 on load.
      max_versions: 3             # Keep up to N rotated versions; older are deleted.
      gzip: false                 # Compress dumps with gzip (smaller disk, more CPU).
    mock:
      enabled: false              # If true, prefill cache with mock data (for local testing).
      length: 1000000             # Number of mock entries to generate.

  storage:
    mode: listing                 # Implementation of LRU algo through per-shard lists or Redis style sampling (values=sampling/listing).
    size: 10737418240             # Max memory budget for storage (bytes). Here: 50 GiB.

  admission:
    enabled: true
    capacity: 2000000               # global TinyLFU window size (how many recent events are tracked)
    sample_multiplier: 4            # multiplies the window for aging (larger = slower aging)
    shards: 256                     # number of independent shards (reduces contention)
    min_table_len_per_shard: 65536  # minimum counter slots per shard (larger = fewer collisions)
    door_bits_per_counter: 12       # bits per counter (2^bits-1 max count; larger = less saturation)

  eviction:
    enabled: true                 # Run background evictor to keep memory under configured thresholds.
    replicas: 32                  # Number of evictor workers (>=1). Increase for large heaps / high churn.
    soft_limit: 0.8               # At storage.size × soft_limit start gentle eviction + tighten admission.
    hard_limit: 0.99              # At storage.size × hard_limit trigger minimal hot-path eviction; also set debug memory limit.
    check_interval: "100ms"       # Defines how often the main evictor loop will check the memory limit.

  lifetime:
    enabled: true                 # Enable background refresh/remove for eligible entries.
    ttl: "2h"                     # Default TTL for successful (200) responses unless overridden by rules.
    on_ttl: refresh               # Values: 'remove' or 'refresh'. Defines behavior on TTL overcome.
    beta: 0.35                    # Jitter factor (0..1) to spread refreshes/removes and prevent thundering herd.
    rate: 1000                    # Global refresh/remove QPS cap to upstreams (safety valve).
    replicas: 32                  # Number of workers (>=1).
    coefficient: 0.25             # Start refresh/remove attempts at TTL × coefficient (e.g., 0.5 = at 50% of TTL).

  traces:
    enabled: true
    service_name: "adv_cache"
    service_version: "prod"
    exporter: "http"              # "stdout" | "grpc" (OTLP/4317) | "http" (OTLP/4318)
    endpoint: "localhost:4318"    # ignores when exporter=stdout
    insecure: true                # should use TLS?
    sampling_mode: "always"       # off | always | ratio
    sampling_rate: 1.0            # 10% head-sampling (используется при mode=ratio)
    export_batch_size: 512
    export_batch_timeout: "3s"
    export_max_queue: 1024

  metrics:
    enabled: true                 # Expose Prometheus-style metrics (and/or internal stats if logs.stats=true).

  k8s:
    probe:
      timeout: "5s"               # Liveness/readiness probe timeout for the service endpoints.

  rules:
    /api/v1/user:                 # OnTTL: Will inherit global refresh unless overridden here.
      cache_key:
        query:                    # Include query params by prefix into the cache key (order-insensitive).
          - user[id]
          - timezone
        headers:                  # Include these request headers into the cache key (exact match).
          - Accept-Encoding
      cache_value:
        headers:                  # Response headers to store/forward with cached value.
          - Vary
          - Server
          - Content-Type
          - Content-Length
          - Content-Encoding
          - Cache-Control
          - X-Error-Reason

    /api/v1/client:
      cache_key:
        query:
          - user[id]
          - domain
          - timezone
        headers:
          - Accept-Encoding
      cache_value:
        headers:
          - Vary
          - Server
          - Content-Type
          - Content-Length
          - Content-Encoding
          - Cache-Control
          - X-Error-Reason

🏗️ Architecture

Click to expand Architecture

Core Components

Storage Layer

Sharded Map: 1024 shards for distributed lock contention
LRU Implementation: Doubly-linked list with raw pointers for O(1) operations OR Redis-style LRU sampling (can be changed through Config)

Admission Control

TinyLFU Algorithm: Frequency-based admission using Count-Min Sketch
Doorkeeper: Short-term frequency filter to prevent one-hit wonders
Configurable Sharding: 256 shards with per-shard frequency tables

Background Workers

Eviction Worker: Soft and hard memory limit enforcement with configurable intervals
Lifetime Manager: TTL-based refresh and expiration with beta distribution for load spreading

📡 API Reference

Click to expand API Reference

Main Endpoints

Endpoint	Method	Description
`/*`	GET	Main cache/proxy route - serves cached content or proxies to upstream
`/k8s/probe`	GET	Kubernetes health probe endpoint
`/metrics`	GET	Prometheus/VictoriaMetrics metrics endpoint

Cache Control Endpoints

Endpoint	Method	Description
`/advcache/bypass`	GET	Get bypass status
`/advcache/bypass/on`	GET	Enable cache bypass (all requests go to upstream)
`/advcache/bypass/off`	GET	Disable cache bypass
`/advcache/clear`	GET	Two-step cache clear (returns token)
`/advcache/clear?token={token}`	GET	Execute cache clear with token
`/advcache/invalidate?_path={path}&{queries}`	GET	Invalidate cache entries matching path and queries
`/advcache/invalidate?_path={path}&_remove=true`	GET	Remove cache entries (instead of marking outdated)
`/advcache/entry?key={uint64}`	GET	Get cache entry by key

Worker Management Endpoints

Endpoint	Method	Description
`/advcache/eviction`	GET	Get eviction worker status
`/advcache/eviction/on`	GET	Enable eviction worker
`/advcache/eviction/off`	GET	Disable eviction worker
`/advcache/eviction/scale?to={n}`	GET	Scale eviction worker replicas
`/advcache/lifetime-manager`	GET	Get lifetime manager status
`/advcache/lifetime-manager/on`	GET	Enable lifetime manager
`/advcache/lifetime-manager/off`	GET	Disable lifetime manager
`/advcache/lifetime-manager/rate?to={n}`	GET	Set refresh rate
`/advcache/lifetime-manager/scale?to={n}`	GET	Scale lifetime manager replicas
`/advcache/lifetime-manager/policy`	GET	Get refresh policy
`/advcache/lifetime-manager/policy/remove`	GET	Set policy to remove on TTL
`/advcache/lifetime-manager/policy/refresh`	GET	Set policy to refresh on TTL

Configuration & Monitoring

Endpoint	Method	Description
`/advcache/config`	GET	Dump current configuration
`/advcache/admission`	GET	Get admission control status
`/advcache/admission/on`	GET	Enable admission control
`/advcache/admission/off`	GET	Disable admission control
`/advcache/upstream/policy`	GET	Get upstream policy (await/deny)
`/advcache/upstream/policy/await`	GET	Set upstream policy to await (back-pressure)
`/advcache/upstream/policy/deny`	GET	Set upstream policy to deny (fail-fast)
`/advcache/http/compression`	GET	Get compression status
`/advcache/http/compression/on`	GET	Enable response compression
`/advcache/http/compression/off`	GET	Disable response compression
`/advcache/traces`	GET	Get tracing status
`/advcache/traces/on`	GET	Enable OpenTelemetry tracing
`/advcache/traces/off`	GET	Disable OpenTelemetry tracing

OpenAPI Documentation

Complete API documentation is available via Swagger/OpenAPI specification at api/swagger.yaml.

📊 Monitoring & Observability

Click to expand Monitoring & Observability

Prometheus Metrics

Metrics are exposed at /metrics endpoint in Prometheus format:

Cache Metrics: Hits, misses, hit ratio, cache size, memory usage
Request Metrics: Request count, latency, status codes
Worker Metrics: Eviction counts, refresh counts, worker status
Upstream Metrics: Upstream requests, errors, timeouts

OpenTelemetry Tracing

Tracing provides distributed tracing with minimal overhead:

Spans: ingress (server), upstream (client on miss/proxy), refresh (background)
Exporters: stdout (sync), OTLP HTTP/gRPC (batch)
Sampling: Configurable ratio-based sampling (default: 0.1)
Fast No-Op: When disabled, uses atomic toggle only (zero overhead)

Health Checks

Endpoint: GET /k8s/probe
Response: 200 OK when healthy, 503 when unhealthy
Timeout: Configurable via k8s.probe.timeout (default: 5s)

Logging

Structured logging with configurable levels:

Format: JSON (production) or ANSI (development)
Levels: trace, debug, info, warn, error
Components: Component-based filtering for focused debugging

🛠️ Development

Click to expand Development

Building from Source

# Clone repository
git clone https://github.com/Borislavv/adv-cache.git
cd adv-cache

# Build in release mode
cargo build --release

# Run with local configuration
./target/release/advcache -cfg ./cfg/advcache.cfg.yaml

Running Tests

# Run all tests
cargo test

# Run library tests only
cargo test --lib

# Run integration tests
cargo test --test e2e

Benchmark Results

Local (4-6 CPU, 1-16KB docs, 20-25GB store): 165k RPS steady
Bare-metal (24 CPU, 50GB store, production traffic): ~300k RPS sustained
Memory Overhead: ~64 bytes per key.

Note: Performance depends on workload characteristics (document size, cache hit ratio, request patterns).

Performance Charts

Click to expand Performance Charts

MacBook M2 Pro Max (12 cores, 24GB RAM)

Small requests: 256 bytes up to 1kb; hit rate: 100%
Medium requests: 1kb up to 8kb; hit rate: 100%
Large requests: 1kb up to 16kb; hit rate: 100%
Backpreasure: 1kb up to 8kb reqeusts with active eviction; hit rate: ~92%
Backpreasure: 1kb up to 8kb reqeusts with active eviction; hit rate: ~58%
Proxy: req. weight 4kb
Proxy: req. weight 32kb

🎯 Performance Tuning

Click to expand Performance Tuning

Throughput Optimization

CPU Cores: Set runtime.num_cpus: 0 to use all available cores
Sharding: Default 1024 shards provide optimal lock contention distribution
Admission Control: Enable TinyLFU to protect hot cache set
Worker Scaling: Adjust eviction.replicas and lifetime.replicas based on load

Memory Optimization

Storage Size: Set storage.size based on available memory (recommended: 50-80% of total)
Object Pooling: Enabled by default, no configuration needed
Eviction Limits: Configure soft_limit (0.8) and hard_limit (0.99) for memory pressure handling

Latency Optimization

Compression: Enable with level: 1 for minimal CPU overhead
Upstream Timeout: Set appropriate timeout and max_timeout values
Connection Pooling: Configure concurrency for upstream connections
Tracing: Use sampling (sampling_rate: 0.1) to reduce overhead

📄 License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

👤 Maintainer

Borislav Glazunov

Email: glazunov2142@gmail.com
Telegram: @gl_c137

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📚 Additional Resources

Built with ❤️ using Rust

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github/workflows		.github/workflows
api		api
cfg		cfg
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CNAME		CNAME
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
GRAFANA_SETUP.md		GRAFANA_SETUP.md
LICENSE		LICENSE
METRICS.md		METRICS.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

License

Borislavv/rs-adv-cache

Folders and files

Latest commit

History

Repository files navigation

Advanced Cache

🚀 Key Features

Performance & Scalability

Advanced Caching

Production-Ready Features

Developer Experience

📋 Table of Contents

🏃 Quick Start

Prerequisites

Installation

Using Docker

Building from Source

Configuration

🏗️ Architecture

Core Components

Storage Layer

Admission Control

Background Workers

📡 API Reference

Main Endpoints

Cache Control Endpoints

Worker Management Endpoints

Configuration & Monitoring

OpenAPI Documentation

📊 Monitoring & Observability

Prometheus Metrics

OpenTelemetry Tracing

Health Checks

Logging

🛠️ Development

Building from Source

Running Tests

Benchmark Results

Performance Charts

MacBook M2 Pro Max (12 cores, 24GB RAM)

🎯 Performance Tuning

Throughput Optimization

Memory Optimization

Latency Optimization

📄 License

👤 Maintainer

🤝 Contributing

📚 Additional Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages