Optimized implementation of MIS by jnke2016 · Pull Request #5294 · rapidsai/cugraph

jnke2016 · 2025-10-02T03:57:46Z

No description provided.

copy-pr-bot · 2025-10-02T03:57:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

seunghwak · 2025-10-04T04:21:06Z

cpp/src/components/mis_impl.cuh


-    rmm::device_uvector<vertex_t> temporary_ranks(local_vtx_partitoin_size, handle.get_stream());
-    thrust::copy(handle.get_thrust_policy(), ranks.begin(), ranks.end(), temporary_ranks.begin());
+  vertex_t isolated_v_start = multi_gpu ? segment_offsets->data()[4] : segment_offsets->data()[3];


See https://github.com/rapidsai/cugraph/blob/branch-25.12/cpp/src/structure/renumber_edgelist_impl.cuh#L768

hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.

So, this code is inaccurate.

vertex_t isolated_v_start = *(segment_offsets->rbegin() + 1);

The last segment is always a 0 degree segment; so this code should work.

And there is no guarantee that segement_offsts.has_value() is true. If renumber is set to false, segment_offsets.has_value() is false.

In this case, you need to compute the degrees, and scan the vertex list based on the degrees.

isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.

hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.

Thanks for catching this. After analyzing the datasets used in the paper, i realized None have hypersparse regions so I wasn't pre-filling the priority values of the isolated vertices (because I was always starting from local_vertex_partition_range_last()). After applying the suggested change, I get a small speedup ~2%

isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.

right but the former indicate a range and the latter a number. Maybe local_nzd_vertices_begin ?

seunghwak · 2025-10-04T04:27:07Z

cpp/src/components/mis_impl.cuh

+  rmm::device_uvector<vertex_t> remaining_vertices(isolated_v_start, handle.get_stream());

-    // Select a random set of candidate vertices
+  thrust::for_each(


This is thrust::transform. Better use a more specific algorithm than a generic for_each if possible.

And what you are doing is basically setting ranks to vertex ID if degree is non-zero and std::numeric_limit<vertex_t>::max() if isolated.

And copying remaining_vertices.

if (segment_offsets) {
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1),
remaining_vertices.begin());
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1), ranks.begin());
thrust::fill(handle.get_thrust_policy(), ranks.begin( )+ *(segment_offsets->rbegin() + 1), ranks.end(), std::numeirc_limits<vertex_t>::max());
}
else {
compute degrees and set ranks, remaining_vertices based on degrees.
}

seunghwak · 2025-10-04T04:33:47Z

cpp/src/components/mis_impl.cuh

+                                                                                       num_buckets);
+
+  size_t loop_counter = 0;
+  vertex_t nr_remaining_vertices_to_check = remaining_vertices.size();


This holds an invalid value if # GPUs > 1.

seunghwak · 2025-10-04T04:35:29Z

cpp/src/components/mis_impl.cuh

-        }
-      });
+  while (true) {
+    loop_counter++;


This is minor point but it is more natural to increase the loop_count at the end of the loop. We are starting loop 0. But loop_count is already 1. This is conceptually a bit misleading

seunghwak · 2025-10-04T04:43:45Z

cpp/src/components/mis_impl.cuh

+
+        // FIXME: Since we know that the property being updated are either
+        // std::numeric_limits<vertex_t>::max() or std::numeric_limits<vertex_t>::min(),
+        // explore 'fill_edge_dst_property' which is faster


Yes, in case of update, you need to communicate both vertices and values. In case of fill, you need to communicate only vertices (and if # vertices is large compared to # local vertex partition size, we use bitmaps to cut communication volume).

So, really the trade-off here is two cheaper calls vs one more expensive call.

seunghwak · 2025-10-04T04:45:02Z

cpp/src/components/mis_impl.cuh

+        // Only update the property of endpoints that had their ranks modified       
+        rmm::device_uvector<vertex_t> processed_ranks(
+          num_processed_vertices, handle.get_stream());
+
+        auto pair_idx_processed_vertex_first = thrust::make_zip_iterator(
+          thrust::make_counting_iterator<size_t>(0),
+          remaining_vertices.begin() + nr_remaining_local_vertices_to_check
+        );
+
+        thrust::for_each(
+          handle.get_thrust_policy(),
+          pair_idx_processed_vertex_first,
+          pair_idx_processed_vertex_first + num_processed_vertices,
+          [processed_ranks =
+            raft::device_span<vertex_t>(processed_ranks.data(), processed_ranks.size()),
+          ranks =
+            raft::device_span<vertex_t>(ranks.data(), ranks.size()),
+          v_first = graph_view.local_vertex_partition_range_first()] __device__(auto pair_idx_v) {
+
+            auto idx = thrust::get<0>(pair_idx_v);
+            auto v = thrust::get<1>(pair_idx_v);
+            auto v_offset          = v - v_first;
+
+            processed_ranks[idx] = ranks[v_offset];
+        });


Isn't this thrust::gather?

add changes for undirected MIS

d0019ae

jnke2016 mentioned this pull request Oct 2, 2025

Udpate MIS implementation #5214

Closed

jnke2016 added 2 commits October 1, 2025 21:01

fix wrong copy

75879f4

remove unused statement

e2277b5

seunghwak reviewed Oct 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized implementation of MIS#5294

Optimized implementation of MIS#5294
jnke2016 wants to merge 3 commits intorapidsai:mainfrom
jnke2016:branch-25.12_MIS

jnke2016 commented Oct 2, 2025

Uh oh!

copy-pr-bot bot commented Oct 2, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

jnke2016 Oct 6, 2025

Uh oh!

jnke2016 Oct 6, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

seunghwak Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jnke2016 commented Oct 2, 2025

Uh oh!

copy-pr-bot bot commented Oct 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants