Util to get packed table buffer size #21322
Util to get packed table buffer size #21322nirandaperera wants to merge 13 commits intorapidsai:mainfrom
Conversation
Signed-off-by: niranda perera <niranda.perera@gmail.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
cpp/src/copying/contiguous_split.cu
Outdated
| // Handle empty table cases | ||
| if (input.num_columns() == 0 || input.num_rows() == 0) { return 0; } | ||
|
|
||
| auto const num_partitions = std::size_t{1}; // no splits, just computing size for pack | ||
| auto const num_src_bufs = count_src_bufs(input.begin(), input.end()); | ||
| auto const num_bufs = static_cast<std::size_t>(num_src_bufs) * num_partitions; | ||
|
|
||
| // Compute splits to get buffer sizes | ||
| auto partition_buf_size_and_dst_buf_info = | ||
| compute_splits(input, {} /* no splits */, num_partitions, num_src_bufs, num_bufs, stream, mr); |
There was a problem hiding this comment.
This part replicates the set up of the contiguous_split_state struct. Can we make it a static method of that struct so that the code is not repeated and can be used in both places.
There was a problem hiding this comment.
I added a method here 5d304ae
Not the most elegant TBH 😇
There was a problem hiding this comment.
Argh! This change conflicts with make_empty_packed_table() method. Fixing now
| * | ||
| * @param input View of the table to compute the packed size for | ||
| * @param stream CUDA stream used for device memory operations and kernel launches | ||
| * @param mr An optional memory resource to use for temporary allocations |
There was a problem hiding this comment.
The mr parameter is generally not intended for this kind of use in public APIs in libcudf.
I recommend removing the parameter.
There was a problem hiding this comment.
@davidwendt we need a temp mr for the compute_splits method. What should we use there?
There was a problem hiding this comment.
Just use cudf::get_current_device_resource_ref() at the compute_splits() call directly:
auto partition_buf_size_and_dst_buf_info =
compute_splits(input, {} /* no splits */, num_partitions, num_src_bufs, num_bufs,
stream, cudf::get_current_device_resource_ref());
There was a problem hiding this comment.
I understand that this would be good enough for this PR and the new allocations are minimal. But I thought cudf was heading towards accepting an mr rather than using current device mr, in all methods?
There was a problem hiding this comment.
Not quite. Currently we are talking about passing in both mr's in a container class in order to help make it easier to override on individual calls without having to modify the global one. That will be a large effort.
Having said that, I think you are right that eventually this mr would end up back here.
The doxygen in the .hpp clearly stated this was a temporary mr so I'm ok with leaving it but perhaps call the parameter temp_mr.
Signed-off-by: niranda perera <niranda.perera@gmail.com>
… cudf_packed_size
cpp/src/copying/contiguous_split.cu
Outdated
| const_cast<size_type&>(num_src_bufs), | ||
| const_cast<std::size_t&>(num_bufs), |
There was a problem hiding this comment.
This const_cast is needed because you now explicitly initialise the const num_bufs/num_src_bufs?
I am leery, because const_casting away constness always feels like a bad idea.
There was a problem hiding this comment.
Yes, that was the reason. But, I removed the const from those 2 class members now. I think it should be okay. Do you think they should be initialized to INT::MAX like number so that it fails quickly in case of a future mistake?
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
cpp/src/copying/contiguous_split.cu
Outdated
| // First pass over the source tables to generate a `dst_buf_info` per split and column buffer | ||
| // (`num_bufs`). After this, contiguous_split uses `dst_buf_info` to further subdivide the work | ||
| // into 1MB batches in `compute_batches` | ||
| partition_buf_size_and_dst_buf_info = std::move( | ||
| compute_splits(input, splits, num_partitions, num_src_bufs, num_bufs, stream, temp_mr)); | ||
| auto result = compute_num_bufs_and_splits(input, splits, stream, temp_mr); | ||
| num_src_bufs = std::get<0>(result); | ||
| num_bufs = std::get<1>(result); | ||
| partition_buf_size_and_dst_buf_info = std::move(std::get<2>(result)); |
There was a problem hiding this comment.
std::tie(num_src_bufs, num_buffs, partition_buf_size_and_dst_buf_info) = compute_num_bufs_and_splits(...);
Then we can reconstify the num_src_bufs and num_bufs I think.
There was a problem hiding this comment.
partition_buf_size_and_dst_buf_info is unique_ptr. So, I think we cant use tie. That's one reason why I initially went with the out-ref approach.
There was a problem hiding this comment.
Looks like it should work: https://godbolt.org/z/TPWdYce7z
There was a problem hiding this comment.
Sorry, I was mistaken. I think I mistook the error with the const assignment errors.
Changed this to std::tie now
|
@davidwendt I think these python test failures are unrelated. Is the CI broken currently? |
Yes |
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Description
Closes #21321
Checklist