@@ -25,43 +25,38 @@ ZenANN will be implemented in C++ for high performance and exposes an intuitive
2525### Index Hierarchy
2626There will be an abstract base index, which provides a unified interface for different index classes.
27271 . ** Base Index Class**
28- - ` indexBase ` : Defines the common API for all indexing methods (eg. ` add() ` , ` search() ` , ` train() ` , ` reorder_layout() ` )
29-
30- 2 . ** Derived Index Classes**
31- - ` indexHNSW ` : A graph-based structure for accurate and efficient ANN
32- - ` indexIVF ` : A cluster-based structure for large dataset
33- 3 . ** Hybrid Index Classes**
34- - ` indexIVF_HNSW ` / ` indexHNSW_IVF ` : For fast-coveraging larger datasets
35- 4 . (Optional) ** Quantization Index Classes**
36- - ` indexPQ ` : Combined with product quantization for memory-limited scenarios
28+ - ` indexBase ` : Defines the common API for all indexing methods (eg. ` add() ` , ` search() ` , ` train() ` )
29+ 2 . ** KD-tree Index Class**
30+ - ` KDTreeIndex ` : To serve as a baseline for approximate search algorithms, KD-tree is used to perform exact search.
31+ 3 . ** IVF Index Class**
32+ - ` IVFIndex ` : A cluster-based structure for large dataset
33+ 4 . ** HNSW Index Class**
34+ - ` HNSWIndex ` : A graph-based structure for accurate and efficient ANN
3735
3836Note: Actual implementation detail of HNSW may be built on Faiss's interface according to development progress
3937
4038### Processing Flow
41391 . Initialize an index (e.g., ` indexBase ` , ` indexHNSW ` )
42- 2 . Build an index
43- 2-1. Add the given vector data using ` add() ` to a specific index instance.
44- 2-2. Train index with ` train() ` if needed
45- 2-3. Optimize the index data layout with ` reorder_layout() ` to improve cache locality.
40+ 2 . Build an index with ` add() `
41+ - Add the given vector data to a specific index instance.
42+ - Train index with ` train() ` if needed(for IVF-based Index)
43+ - Optimize the index data layout with reorder_layout in Faiss submodule to improve cache locality.
46444 . Perform a query on the specified index instance using ` search() ` .
47- 5 . Evaluate accuracy using the ` get_statistics() ` API .
45+ 5 . Return result set with top-k id & estimated distance for each query .
4846
4947## API Description
5048There is a simple python examples for understanding the API design
5149```
5250import zenann
5351
54- # Initialize an HNSW index
55- index = zenann.HNSWIndex(dimension =128, ef_construction=200, M=16 )
52+ # Initialize an index for ANN search
53+ index = zenann.HNSWIndex(dim =128, M=16, efConstruction=200 )
5654
57- # Add vectors to the index and conduct reordering
55+ # Add vectors to the index and conduct training / reordering
5856index.add(data_vectors)
59- index.train()
60- index.reorder_layout()
6157
6258# Perform a search
6359results = index.search(query_vector, k=5, efSearch=100)
64- recall = get_statistics(results, ground_truth)
6560```
6661
6762## Engineering Infrastructure
@@ -71,10 +66,10 @@ recall = get_statistics(results, ground_truth)
7166- Git
7267- Github
7368### Testing Framework
74- - C++: Google Test
7569- Python: pytest
7670### Documentation
7771- Markdown
72+ - Mermaid
7873### Continuous Integration
7974- Github Actions
8075
0 commit comments