Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
|
||
| rmm::device_uvector<vertex_t> temporary_ranks(local_vtx_partitoin_size, handle.get_stream()); | ||
| thrust::copy(handle.get_thrust_policy(), ranks.begin(), ranks.end(), temporary_ranks.begin()); | ||
| vertex_t isolated_v_start = multi_gpu ? segment_offsets->data()[4] : segment_offsets->data()[3]; |
There was a problem hiding this comment.
hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.
So, this code is inaccurate.
vertex_t isolated_v_start = *(segment_offsets->rbegin() + 1);
The last segment is always a 0 degree segment; so this code should work.
There was a problem hiding this comment.
And there is no guarantee that segement_offsts.has_value() is true. If renumber is set to false, segment_offsets.has_value() is false.
In this case, you need to compute the degrees, and scan the vertex list based on the degrees.
There was a problem hiding this comment.
isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.
There was a problem hiding this comment.
hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.
Thanks for catching this. After analyzing the datasets used in the paper, i realized None have hypersparse regions so I wasn't pre-filling the priority values of the isolated vertices (because I was always starting from local_vertex_partition_range_last()). After applying the suggested change, I get a small speedup ~2%
There was a problem hiding this comment.
isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.
right but the former indicate a range and the latter a number. Maybe local_nzd_vertices_begin ?
| rmm::device_uvector<vertex_t> remaining_vertices(isolated_v_start, handle.get_stream()); | ||
|
|
||
| // Select a random set of candidate vertices | ||
| thrust::for_each( |
There was a problem hiding this comment.
This is thrust::transform. Better use a more specific algorithm than a generic for_each if possible.
And what you are doing is basically setting ranks to vertex ID if degree is non-zero and std::numeric_limit<vertex_t>::max() if isolated.
And copying remaining_vertices.
if (segment_offsets) {
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1),
remaining_vertices.begin());
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1), ranks.begin());
thrust::fill(handle.get_thrust_policy(), ranks.begin( )+ *(segment_offsets->rbegin() + 1), ranks.end(), std::numeirc_limits<vertex_t>::max());
}
else {
compute degrees and set ranks, remaining_vertices based on degrees.
}
| num_buckets); | ||
|
|
||
| size_t loop_counter = 0; | ||
| vertex_t nr_remaining_vertices_to_check = remaining_vertices.size(); |
There was a problem hiding this comment.
This holds an invalid value if # GPUs > 1.
| } | ||
| }); | ||
| while (true) { | ||
| loop_counter++; |
There was a problem hiding this comment.
This is minor point but it is more natural to increase the loop_count at the end of the loop. We are starting loop 0. But loop_count is already 1. This is conceptually a bit misleading
|
|
||
| // FIXME: Since we know that the property being updated are either | ||
| // std::numeric_limits<vertex_t>::max() or std::numeric_limits<vertex_t>::min(), | ||
| // explore 'fill_edge_dst_property' which is faster |
There was a problem hiding this comment.
Yes, in case of update, you need to communicate both vertices and values. In case of fill, you need to communicate only vertices (and if # vertices is large compared to # local vertex partition size, we use bitmaps to cut communication volume).
So, really the trade-off here is two cheaper calls vs one more expensive call.
| // Only update the property of endpoints that had their ranks modified | ||
| rmm::device_uvector<vertex_t> processed_ranks( | ||
| num_processed_vertices, handle.get_stream()); | ||
|
|
||
| auto pair_idx_processed_vertex_first = thrust::make_zip_iterator( | ||
| thrust::make_counting_iterator<size_t>(0), | ||
| remaining_vertices.begin() + nr_remaining_local_vertices_to_check | ||
| ); | ||
|
|
||
| thrust::for_each( | ||
| handle.get_thrust_policy(), | ||
| pair_idx_processed_vertex_first, | ||
| pair_idx_processed_vertex_first + num_processed_vertices, | ||
| [processed_ranks = | ||
| raft::device_span<vertex_t>(processed_ranks.data(), processed_ranks.size()), | ||
| ranks = | ||
| raft::device_span<vertex_t>(ranks.data(), ranks.size()), | ||
| v_first = graph_view.local_vertex_partition_range_first()] __device__(auto pair_idx_v) { | ||
|
|
||
| auto idx = thrust::get<0>(pair_idx_v); | ||
| auto v = thrust::get<1>(pair_idx_v); | ||
| auto v_offset = v - v_first; | ||
|
|
||
| processed_ranks[idx] = ranks[v_offset]; | ||
| }); |
There was a problem hiding this comment.
Isn't this thrust::gather?
No description provided.