Skip to content

Util to get packed table buffer size #21322

Open
nirandaperera wants to merge 13 commits intorapidsai:mainfrom
nirandaperera:cudf_packed_size
Open

Util to get packed table buffer size #21322
nirandaperera wants to merge 13 commits intorapidsai:mainfrom
nirandaperera:cudf_packed_size

Conversation

@nirandaperera
Copy link
Contributor

Description

Closes #21321

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: niranda perera <niranda.perera@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 3, 2026
Comment on lines 2158 to 2167
// Handle empty table cases
if (input.num_columns() == 0 || input.num_rows() == 0) { return 0; }

auto const num_partitions = std::size_t{1}; // no splits, just computing size for pack
auto const num_src_bufs = count_src_bufs(input.begin(), input.end());
auto const num_bufs = static_cast<std::size_t>(num_src_bufs) * num_partitions;

// Compute splits to get buffer sizes
auto partition_buf_size_and_dst_buf_info =
compute_splits(input, {} /* no splits */, num_partitions, num_src_bufs, num_bufs, stream, mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part replicates the set up of the contiguous_split_state struct. Can we make it a static method of that struct so that the code is not repeated and can be used in both places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a method here 5d304ae
Not the most elegant TBH 😇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! This change conflicts with make_empty_packed_table() method. Fixing now

*
* @param input View of the table to compute the packed size for
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr An optional memory resource to use for temporary allocations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mr parameter is generally not intended for this kind of use in public APIs in libcudf.
I recommend removing the parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwendt we need a temp mr for the compute_splits method. What should we use there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use cudf::get_current_device_resource_ref() at the compute_splits() call directly:

  auto partition_buf_size_and_dst_buf_info =
    compute_splits(input, {} /* no splits */, num_partitions, num_src_bufs, num_bufs,
            stream, cudf::get_current_device_resource_ref());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this would be good enough for this PR and the new allocations are minimal. But I thought cudf was heading towards accepting an mr rather than using current device mr, in all methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Currently we are talking about passing in both mr's in a container class in order to help make it easier to override on individual calls without having to modify the global one. That will be a large effort.
Having said that, I think you are right that eventually this mr would end up back here.
The doxygen in the .hpp clearly stated this was a temporary mr so I'm ok with leaving it but perhaps call the parameter temp_mr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwendt I renamed it to temp_mr

@nirandaperera nirandaperera marked this pull request as ready for review February 4, 2026 20:48
@nirandaperera nirandaperera requested a review from a team as a code owner February 4, 2026 20:48
Comment on lines 1978 to 1979
const_cast<size_type&>(num_src_bufs),
const_cast<std::size_t&>(num_bufs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This const_cast is needed because you now explicitly initialise the const num_bufs/num_src_bufs?

I am leery, because const_casting away constness always feels like a bad idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the reason. But, I removed the const from those 2 class members now. I think it should be okay. Do you think they should be initialized to INT::MAX like number so that it fails quickly in case of a future mistake?

@davidwendt davidwendt added feature request New feature or request 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Feb 5, 2026
Comment on lines 1944 to 1973
// First pass over the source tables to generate a `dst_buf_info` per split and column buffer
// (`num_bufs`). After this, contiguous_split uses `dst_buf_info` to further subdivide the work
// into 1MB batches in `compute_batches`
partition_buf_size_and_dst_buf_info = std::move(
compute_splits(input, splits, num_partitions, num_src_bufs, num_bufs, stream, temp_mr));
auto result = compute_num_bufs_and_splits(input, splits, stream, temp_mr);
num_src_bufs = std::get<0>(result);
num_bufs = std::get<1>(result);
partition_buf_size_and_dst_buf_info = std::move(std::get<2>(result));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::tie(num_src_bufs, num_buffs, partition_buf_size_and_dst_buf_info) = compute_num_bufs_and_splits(...);

Then we can reconstify the num_src_bufs and num_bufs I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition_buf_size_and_dst_buf_info is unique_ptr. So, I think we cant use tie. That's one reason why I initially went with the out-ref approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should work: https://godbolt.org/z/TPWdYce7z

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was mistaken. I think I mistook the error with the const assignment errors.
Changed this to std::tie now

@nirandaperera
Copy link
Contributor Author

@davidwendt I think these python test failures are unrelated. Is the CI broken currently?

@davidwendt
Copy link
Contributor

@davidwendt I think these python test failures are unrelated. Is the CI broken currently?

Yes

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Util to get packed table buffer size

3 participants