C++ Disjoint sampling implementation by ChuckHastings · Pull Request #5414 · rapidsai/cugraph

ChuckHastings · 2026-01-29T22:18:06Z

This PR adds the disjoint sampling feature to sampling in C++. C++ tests exist for homogeneous uniform and biased sampling, both for SG and MG.

This should get us started on the disjoint feature, we can add tests for heterogeneous and temporal variations as well, I plan to do that in a follow on activity. We also need to test that the C API level, which I will add to a later pull request.

copy-pr-bot · 2026-01-29T22:18:11Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

seunghwak

Review part 1.

seunghwak · 2026-01-29T22:50:17Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+           rmm::device_uvector<vertex_t>,
+           std::vector<arithmetic_device_uvector_t>,
+           std::optional<rmm::device_uvector<int32_t>>>
+gather_one_hop_edgelist_with_visited(


I am not sure about this name, this function finds all one hop edges and filters out all the edges with visited destinations. This function name some what implies that it keeps the edges with visited destinations. Can we rename this function to easily find what this function is doing?

What about something like gather_edgelist_to_unvisited_neighbors or gather_one_hop_edgelist_to_unvisited_neighbors?

Updated to gather_one_hop_edgelist_to_unvisited_neighbors in next push.

seunghwak · 2026-01-29T22:53:18Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  std::optional<rmm::device_uvector<vertex_t>>& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,


Better take visited_vertices and visited_vertex_lables as R-value references and return the new ones to be more functional.

Will change in the next push.

seunghwak · 2026-01-29T22:54:05Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,
+  bool do_expensive_check)
+{
+  CUGRAPH_EXPECTS(visited_vertices, "Visited vertices must be provided");


In this case, why are we taking std::optional? Better to detect this in compile time than run-time.

Yeah, I started with this as a change to gather_one_hop_vertices, and it would be optional... if specified it added the new logic. Then I realized I needed to restructure the main flow, seemed too complex, so I made it a separate function.

I'll change it in the next push.

seunghwak · 2026-01-29T23:05:33Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  std::optional<edge_property_view_t<edge_t, int32_t const*>> edge_type_view,
+  raft::device_span<vertex_t const> active_majors,
+  std::optional<raft::device_span<int32_t const>> active_major_labels,
+  std::optional<raft::device_span<uint8_t const>> gather_flags,


I know we are using uint8_t in gather_one_hop_edgelist as well, but should we better use bool instead of uint8_t here?

seunghwak · 2026-01-29T23:28:39Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+             visited_vertices      = visited_vertices->data(),
+             visited_labels        = visited_vertex_labels->data(),


Any assumptions about how visited_vertices and visited_vertex_labels will be partitioned in multi-GPU? Will this code work in multi-GPU? (especially in extreme-scale?)

This is managed by update_dst_visited_vertices_and_labels. They are replicated (allgatherv) across the minor communicator. This is necessary for the logic to work.

Size should be reasonably manageable. Number of hops * fanout / p_row (or something like that) would be the expected number of entries per GPU.

seunghwak · 2026-01-30T00:02:00Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+      keep_count);
+  } else {
+    rmm::device_uvector<vertex_t> remove_srcs(result_srcs.size(), handle.get_stream());
+    rmm::device_uvector<vertex_t> remove_dsts(result_dsts.size(), handle.get_stream());


Why do we need srcs here? Aren't we just removing destination vertices that appear more than once?

And this code won't work across multi-GPUs, won't this?

I use srcs in the sort so that I select the edge with the lowest source as the one that is selected. Perhaps not important in the gather_one_hop path, but the other path needs to break the ties in a way that guarantees that at least one source is fully sampled in each iteration of the loop. I kept that in place here more for consistency, but I can drop it if you think we shouldn't worry about that consistency.

The multi-GPU issue is a defect, I'll fix that in the next push. I have an extra shuffle and check in the sample edges path that I should replicate after this to check for duplicates across GPUs.

Can't we break ties with positions?

but the other path needs to break the ties in a way that guarantees that at least one source is fully sampled in each iteration of the loop.

I am having hard time interpreting this. You mean in the sampling path, you want each active major to have at least one sampled edges (so, if there are multiple (different active_major, same minor) pairs, you prefer to select the one with no currently sampled edges?).

seunghwak · 2026-01-30T00:04:49Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+    positions = detail::keep_marked_entries(
+      handle,
+      std::move(positions),
+      raft::device_span<uint32_t const>{keep_flags.data(), keep_flags.size()},
+      keep_count);


I assume this code is same for both the if and else cases? Should we replicate the code?

I could pull it out, but right now the scope of keep_flags and keep_count is inside the code block and they are automatically freed when we exit.

If I pull it out I'll need to define them explicitly and then resize and shrink to fit. Not sure which is better.

Oh, I see, yeah... those are comparable... so not worth the additional work.

seunghwak · 2026-01-30T00:07:47Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  rmm::device_uvector<vertex_t> new_visited_vertices(visited_vertices->size() + result_dsts.size(),
+                                                     handle.get_stream());


Better sort result_dsts first and call thrust::merge. Sorting the entire array again and again is expensive.

This should be calling the update_dst_visited_vertices_and_labels function which handles the MG support as well.

Now your thrust::merge comment might be relevant there.

Next push will call the function instead.

seunghwak · 2026-01-30T00:16:01Z

cpp/src/sampling/detail/sample_edges.cuh

+  raft::device_span<vertex_t const> visited_vertices{};
+  cuda::std::optional<raft::device_span<int32_t const>> visited_vertex_labels{};


Any assumption about partitioning here? will this follow edge partitioning or vertex partitioning? If this follows edge partitioning and if this is for edge destinations, we may better use minors instead of vertices in the naming.

visited_vertices/visited_vertex_labels is an allgatherv across the minor communicator, so all elements are replicated across the minor communicator. This allows any GPU that might include a vertex as a destination to have that information.

Then, should we better name this as visited_minors & visited_minor_labels?

seunghwak

Review part 2

seunghwak · 2026-01-30T00:43:24Z

cpp/src/sampling/detail/sample_edges.cuh

+    }
+
+    // Check for duplicates in the sampled minor vertices
+    rmm::device_uvector<vertex_t> local_majors(sampled_majors.size(), handle.get_stream());


I assume this code to remove duplicates in the sampled minor vertices is same for both this function and the gather-one-hop-edgelist function. Can't we merge the two to a single utility function?

Will look at this as I fix the MG portion of gather-one-hop-edgelist

seunghwak · 2026-01-30T00:44:23Z

cpp/src/sampling/detail/sample_edges.cuh

+           rmm::device_uvector<vertex_t>,
+           std::vector<arithmetic_device_uvector_t>,
+           std::optional<rmm::device_uvector<int32_t>>>
+sample_edges_with_visited(


Similar to the gather one-hop edge list function, should we better rename this function?

Renamed to sample_edges_to_unvisited_neighbors in the latest push.

seunghwak · 2026-01-30T00:44:41Z

cpp/src/sampling/detail/sample_edges.cuh

+  std::optional<rmm::device_uvector<vertex_t>>& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,


Similar here, better take R-value references.

Fixed in next push

seunghwak · 2026-01-30T01:00:53Z

cpp/src/sampling/detail/update_visited_utils.cu

+           std::optional<rmm::device_uvector<int32_t>>>
+update_dst_visited_vertices_and_labels(
+  raft::handle_t const& handle,
+  graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,


Any assumptions about MG partitioning here?

If sampled_vertices follow the vertex partitioning and visited_vertices/visited_vertex follows edge partitioning (for minors), we should better rename accordingly.

Suggestions on names?

Yes, sampled_vertices are partitioned by vertex partitioning,
visited_vertices/visited_vertex_labels are replicated (allgatherv) across the minor communicator.

visited_minors & visited_minor_labels (if in the detail namespace) or visited_dsts and visited_dst_lables (if in the public namespace)?

seunghwak · 2026-01-30T01:01:14Z

cpp/src/sampling/detail/update_visited_utils.cu

+  raft::device_span<vertex_t const> sampled_vertices,
+  std::optional<raft::device_span<int32_t const>> sampled_vertex_labels)
+{
+  CUGRAPH_EXPECTS(visited_vertices.has_value(), "Invalid input: visited_vertices must be provided");


Then, why are we taking std::optional here?

Fixed in next push.

seunghwak · 2026-01-30T01:02:25Z

cpp/src/sampling/detail/update_visited_utils.cu

+
+  if constexpr (multi_gpu) {
+    std::tie(new_samples, props) = cugraph::shuffle_int_vertices(
+      handle, std::move(new_samples), std::move(props), graph_view.vertex_partition_range_lasts());


What about labels here? No need to shuffle labels as well?

Labels are in props. I reorganized in the next push, the props.push_back above and the std::move below are all in the multi_gpu block which makes it clear what's happening.

cpp/tests/sampling/detail/nbr_sampling_validate.cu

ChuckHastings

Will address many of these in my next push. A few comments/questions that aren't corrected yet. A few things not commented on I'll follow up with later.

ChuckHastings · 2026-01-30T20:26:49Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+           rmm::device_uvector<vertex_t>,
+           std::vector<arithmetic_device_uvector_t>,
+           std::optional<rmm::device_uvector<int32_t>>>
+gather_one_hop_edgelist_with_visited(


Updated to gather_one_hop_edgelist_to_unvisited_neighbors in next push.

ChuckHastings · 2026-01-30T20:51:55Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,
+  bool do_expensive_check)
+{
+  CUGRAPH_EXPECTS(visited_vertices, "Visited vertices must be provided");


Yeah, I started with this as a change to gather_one_hop_vertices, and it would be optional... if specified it added the new logic. Then I realized I needed to restructure the main flow, seemed too complex, so I made it a separate function.

I'll change it in the next push.

ChuckHastings · 2026-01-30T20:52:07Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  std::optional<rmm::device_uvector<vertex_t>>& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,


Will change in the next push.

ChuckHastings · 2026-02-05T16:06:52Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+             visited_vertices      = visited_vertices->data(),
+             visited_labels        = visited_vertex_labels->data(),


This is managed by update_dst_visited_vertices_and_labels. They are replicated (allgatherv) across the minor communicator. This is necessary for the logic to work.

Size should be reasonably manageable. Number of hops * fanout / p_row (or something like that) would be the expected number of entries per GPU.

ChuckHastings · 2026-02-05T21:37:02Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+    positions = detail::keep_marked_entries(
+      handle,
+      std::move(positions),
+      raft::device_span<uint32_t const>{keep_flags.data(), keep_flags.size()},
+      keep_count);


I could pull it out, but right now the scope of keep_flags and keep_count is inside the code block and they are automatically freed when we exit.

If I pull it out I'll need to define them explicitly and then resize and shrink to fit. Not sure which is better.

ChuckHastings · 2026-02-06T03:43:38Z

cpp/src/sampling/detail/sample_edges.cuh

+  std::optional<rmm::device_uvector<vertex_t>>& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,


Fixed in next push

ChuckHastings · 2026-02-06T03:44:34Z

cpp/src/sampling/detail/update_visited_utils.cu

+  raft::device_span<vertex_t const> sampled_vertices,
+  std::optional<raft::device_span<int32_t const>> sampled_vertex_labels)
+{
+  CUGRAPH_EXPECTS(visited_vertices.has_value(), "Invalid input: visited_vertices must be provided");


Fixed in next push.

ChuckHastings · 2026-02-06T03:51:19Z

cpp/src/sampling/detail/update_visited_utils.cu

+           std::optional<rmm::device_uvector<int32_t>>>
+update_dst_visited_vertices_and_labels(
+  raft::handle_t const& handle,
+  graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,


Suggestions on names?

Yes, sampled_vertices are partitioned by vertex partitioning,
visited_vertices/visited_vertex_labels are replicated (allgatherv) across the minor communicator.

ChuckHastings · 2026-02-06T03:55:14Z

cpp/src/sampling/detail/update_visited_utils.cu

+
+  if constexpr (multi_gpu) {
+    std::tie(new_samples, props) = cugraph::shuffle_int_vertices(
+      handle, std::move(new_samples), std::move(props), graph_view.vertex_partition_range_lasts());


Labels are in props. I reorganized in the next push, the props.push_back above and the std::move below are all in the multi_gpu block which makes it clear what's happening.

cpp/tests/sampling/detail/nbr_sampling_validate.cu

seunghwak · 2026-02-06T18:47:50Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  // Implement this, should be a little easier than sample_edges_to_unvisited_neighbors, since we
+  // don't need to compute the probability of sampling for an edge based on the label/tag.  We can
+  // just extract everything and then filter the results based on the visited vertices and vertex
+  // labels.


"Implement this,"=>Isn't this now an outdated comment? You already implemented this.

seunghwak · 2026-02-06T19:11:26Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  raft::device_span<vertex_t const> active_majors,
+  std::optional<raft::device_span<int32_t const>> active_major_labels,
+  std::optional<raft::device_span<uint8_t const>> gather_flags,
+  rmm::device_uvector<vertex_t>&& visited_vertices,


In multi-GPU, is this visited_minors? If this just stores the "visited_(local_)vertices, this code won't work.

seunghwak · 2026-02-06T19:13:04Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+  // don't need to compute the probability of sampling for an edge based on the label/tag.  We can
+  // just extract everything and then filter the results based on the visited vertices and vertex
+  // labels.
+  auto [result_srcs, result_dsts, result_properties, result_labels] =


And we may consistently use majors & minors in the detail namespace (even though here, store_transposed == false, so minors are always destinations).

seunghwak · 2026-02-06T19:15:40Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+    positions = detail::keep_marked_entries(
+      handle,
+      std::move(positions),
+      raft::device_span<uint32_t const>{keep_flags.data(), keep_flags.size()},
+      keep_count);


Oh, I see, yeah... those are comparable... so not worth the additional work.

seunghwak · 2026-02-06T19:23:39Z

cpp/src/sampling/detail/gather_one_hop_impl.cuh

+      keep_count);
+  } else {
+    rmm::device_uvector<vertex_t> remove_srcs(result_srcs.size(), handle.get_stream());
+    rmm::device_uvector<vertex_t> remove_dsts(result_dsts.size(), handle.get_stream());


Can't we break ties with positions?

but the other path needs to break the ties in a way that guarantees that at least one source is fully sampled in each iteration of the loop.

I am having hard time interpreting this. You mean in the sampling path, you want each active major to have at least one sampled edges (so, if there are multiple (different active_major, same minor) pairs, you prefer to select the one with no currently sampled edges?).

seunghwak · 2026-02-06T19:30:20Z

cpp/src/sampling/detail/sample_edges.cuh

+  raft::device_span<vertex_t const> visited_vertices{};
+  cuda::std::optional<raft::device_span<int32_t const>> visited_vertex_labels{};


Then, should we better name this as visited_minors & visited_minor_labels?

seunghwak · 2026-02-06T20:09:24Z

cpp/src/sampling/detail/sample_edges.cuh

+  std::optional<edge_arithmetic_property_view_t<edge_t>> edge_bias_view,
+  cugraph::vertex_frontier_t<vertex_t, tag_t, multi_gpu, false>& vertex_frontier,
+  rmm::device_uvector<vertex_t>&& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>&& visited_vertex_labels,


Should better be visited_minors & visited_minor_labels (or visited_dsts and visited_dst_labels if this function is in the public namespace)?

seunghwak · 2026-02-06T20:11:48Z

cpp/src/sampling/detail/sample_edges.cuh

+  std::optional<raft::device_span<int32_t const>> active_major_labels,
+  raft::host_span<size_t const> Ks,
+  rmm::device_uvector<vertex_t>&& visited_vertices,
+  std::optional<rmm::device_uvector<int32_t>>&& visited_vertex_labels,


visited_minors & visited_minor_labels (or visited_dsts & visisted_dst_labels if in the public namespace)?

seunghwak · 2026-02-06T20:13:15Z

cpp/src/sampling/detail/update_visited_utils.cu

+           std::optional<rmm::device_uvector<int32_t>>>
+update_dst_visited_vertices_and_labels(
+  raft::handle_t const& handle,
+  graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,


visited_minors & visited_minor_labels (if in the detail namespace) or visited_dsts and visited_dst_lables (if in the public namespace)?

cpp/tests/sampling/detail/nbr_sampling_validate.cu

ChuckHastings added 3 commits January 29, 2026 11:40

MG testing passes

ad5bd00

Merge branch 'main' into disjoint_sampling_implementation

54e336e

cleanup some minor compile errors after merge

7687a6e

ChuckHastings self-assigned this Jan 29, 2026

ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 29, 2026

ChuckHastings marked this pull request as ready for review January 29, 2026 22:18

ChuckHastings requested review from a team as code owners January 29, 2026 22:18

sample_an_append needs to be set differently between SG and MG

55a4bea

seunghwak reviewed Jan 30, 2026

View reviewed changes

ChuckHastings added 2 commits February 5, 2026 13:20

Fix disjoint logic SG paths that were broken during MG testing

aba46fe

Merge branch 'main' into disjoint_sampling_implementation

977f603

ChuckHastings commented Feb 6, 2026

View reviewed changes

address may PR comments

18a1190

seunghwak reviewed Feb 6, 2026

View reviewed changes

		std::optional<rmm::device_uvector<vertex_t>>& visited_vertices,
		std::optional<rmm::device_uvector<int32_t>>& visited_vertex_labels,

		visited_vertices = visited_vertices->data(),
		visited_labels = visited_vertex_labels->data(),

		rmm::device_uvector<vertex_t> new_visited_vertices(visited_vertices->size() + result_dsts.size(),
		handle.get_stream());

		raft::device_span<vertex_t const> visited_vertices{};
		cuda::std::optional<raft::device_span<int32_t const>> visited_vertex_labels{};

Conversation

ChuckHastings commented Jan 29, 2026

Uh oh!

copy-pr-bot bot commented Jan 29, 2026

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment