This repo contains example code to run faiss to search for nearest neighbors in a dense vector dataset not fitting into RAM (see blogpost).
To run the example, on a machine running Docker, run:
docker build -t nnsearch:latest .
docker run --name nn -d nnsearch:latest
docker exec -it nn bash
cd workspaceand then get and inflate 1M GIST vectors (a benchmark dataset for vector nearest-neighbors search) with:
wget ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz
tar -xzvf gist.tar.gz To perform nearest neighbors search with numpy (this can fail on machines not having 8GB+ of RAM for the process), run:
cd src
python numpy_inference.pyTo perform the same search with faiss (meant to scale to large numbers of vectors), run:
python faiss_training.py
python faiss_inference.pywhen done with runs, make clean in the root folder should clean up all files created along the way.
To monitor memory usage during script execution one can use memory_profiler:
# requires to have run python faiss_training.py before
mprof run faiss_inference.py
# generate memory usage plot vs time
mprof plot -o faiss_inference