Skip to content

Serve pytorch inference requests using batching with redis for faster performance.

Notifications You must be signed in to change notification settings

SABER-labs/torch_batcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Torch Batcher

Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.

Dependencies

Usage

  • For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.

    nvidia-cuda-mps-control -d # To start
    
    # To exit mps after stoping the server do.
    nvidia-cuda-mps-control # Will enter the command prompt
    quit # enter command to quit
  • Start Redis

    redis-server --save "" --appendonly no
  • Start Batch-Serving

    supervisord -c supervisor.conf # Start 3 workers on a single gpu
  • Start Batch benchmark

    python3 bench_batched.py

About

Serve pytorch inference requests using batching with redis for faster performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages