Reduce the Capacity template for IVF-Flat search by lowener · Pull Request #1681 · rapidsai/cuvs

lowener · 2026-01-07T18:13:06Z

Currently the Capacity template goes from 1 to 256 by power of 2.
By changing it to power of 4 from 1 to 256, we can reduce the size of libcuvs from 157 Mb to 146 Mb (11 Mb or 7% reduction).

After some tests on mnist-784-euclidean, across multiple topk and a nprobe of 1 or 5, the impact on the throughput would be around 4%. The measurements are noisy as the power-of-4 version is sometimes faster than the base version. The benchmarks are reproducible by running the script present in the first commit of the PR.

Topk	N-Probes	QPS base	QPS power of 4	Pow-of-4 over Base
1	1	341,646	300,844	88%
1	5	269,131	257,179	96%
2	1	328,880	293,591	89%
2	5	224,674	264,695	118%
4	1	308,350	296,900	96%
4	5	227,393	220,282	97%
5	1	340,225	296,276	87%
5	5	296,486	278,676	94%
10	1	301,967	308,025	102%
10	5	234,487	286,652	122%
20	1	335,355	311,835	93%
20	5	231,498	256,806	111%
50	1	336,700	310,101	92%
50	5	293,545	241,445	82%
100	1	337,883	277,521	82%
100	5	227,633	223,234	98%
--------	--------	-------	-------	-------
Average	--------	-------	-------	96%

seunghwak · 2026-01-07T22:40:07Z

cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh

                              rmm::cuda_stream_view stream)
 {
-  const int capacity = raft::bound_by_power_of_two(k);
+  const int capacity = bound_by_power_of_four(k);


Have you compared the binary size reduction of this approach vs converting capacity to a run-time parameter?

Maybe for the first step, we may try to just estimate the potential size reduction without worrying too much about performance or even correctness.

In the kernel function (https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L810)

Capacity is mainly used in two places.

https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L828
=> Just replacing constexpr to const will be sufficient for initial best case estimate.

https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L852
=> Looks more involved (need to dig into the internals of block_sort_t), but for the initial estimate, we may just set Capacity here to an arbitrary value (e.g. 4) to just quickly get an idea about the upper limit in binary size reduction.

If the size you get with this approach is significantly smaller, then it might be worth further investigation. If the size reduction is comparable or even less, yeah, better not bother.

By fixing the capacity template to 0 the size of libcuvs.so gets to 133Mb.
Power-of-two: libcuvs.so 157Mb
Power-of-four: libcuvs.so 146Mb (7% reduction)
Capacity=0: libcuvs.so 133Mb (18% reduction)

I just noticed that the same recursive pattern for Capacity happens in IVF-PQ (from 1 to 128, but the combination of template parameter is less important) : https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_pq/ivf_pq_compute_similarity_impl.cuh#L533-L543
So if we decide to switch block_sort_t to dynamic parameters we can profit on both types of index

cjnolet · 2026-01-14T21:38:49Z

we can reduce the size of libcuvs from 157 Mb to 146 Mb (11 Mb or 7% reduction).

@lowener is this per architecture? Any idea what the savings is for the binary when all architectures are compiled?

Signed-off-by: Mickael Ide <mide@nvidia.com>

lowener · 2026-01-26T17:28:56Z

we can reduce the size of libcuvs from 157 Mb to 146 Mb (11 Mb or 7% reduction).

@lowener is this per architecture? Any idea what the savings is for the binary when all architectures are compiled?

Yes this seems to be per architecture.
Compiling with --allgpuarch gives me the following number (CUDA 12.2)
Power-of-two: libcuvs.so 1096Mb
Power-of-four: libcuvs.so 1017Mb

Reduction of 79Mb (or 7%) with the following architecture compiled:

70-real
75-real
80-real
86-real
90a-real
100f-real
120a-real
120

tarang-jain · 2026-01-27T22:09:59Z

cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh

      }
    }
    if constexpr (Capacity > 1) {
      if (k_max * 2 <= Capacity) {


Have you verified that this check (k_max * 2 <= Capacity) will still remain the same?

lowener requested a review from a team as a code owner January 7, 2026 18:13

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Jan 7, 2026

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Jan 7, 2026

divyegala requested review from cjnolet and divyegala January 7, 2026 18:38

seunghwak reviewed Jan 7, 2026

View reviewed changes

cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Jan 26, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 26, 2026

cjnolet assigned lowener Jan 26, 2026

lowener requested review from a team as code owners January 26, 2026 14:40

lowener requested a review from msarahan January 26, 2026 14:40

lowener added 4 commits January 26, 2026 06:42

Add benchmark script

7a5a9ce

Signed-off-by: Mickael Ide <mide@nvidia.com>

Remove bench script Update copyright

eedb3cf

Use pow-of-4

6fbed48

Fix power of 4

95075bf

Signed-off-by: Mickael Ide <mide@nvidia.com>

lowener force-pushed the 26.02-flat-kernel branch from 02ff8bf to 95075bf Compare January 26, 2026 14:43

lowener removed request for a team January 26, 2026 14:44

lowener removed request for a team and msarahan January 26, 2026 14:44

tarang-jain reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the Capacity template for IVF-Flat search#1681

Reduce the Capacity template for IVF-Flat search#1681
lowener wants to merge 4 commits intorapidsai:mainfrom
lowener:26.02-flat-kernel

lowener commented Jan 7, 2026 •

edited

Loading

Uh oh!

seunghwak Jan 7, 2026

Uh oh!

lowener Jan 26, 2026

Uh oh!

lowener Feb 4, 2026

Uh oh!

cjnolet commented Jan 14, 2026

Uh oh!

lowener commented Jan 26, 2026

Uh oh!

tarang-jain Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lowener commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seunghwak Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

lowener Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

lowener Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Jan 14, 2026

Uh oh!

lowener commented Jan 26, 2026

Uh oh!

tarang-jain Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lowener commented Jan 7, 2026 •

edited

Loading