-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Enable threaded data initialization for CPU. #11974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors gradient index construction on CPU to consistently use Context-based threading and introduces a new block-based parallel helper for initializing reference-counted buffers.
Changes:
- Update
GHistIndexMatrixconstructors and internal methods to takeContextand use newMakeFixedVecWithMalloc(ctx, ...)for multi-threaded, malloc-backed buffers on CPU (and hook up corresponding call sites and tests). - Refactor gradient-index page sources (
SparsePageDMatrix::GetGradientIndexandGradientIndexPageSource) to accept aContextand constructGHistIndexMatrixwith it, including for external-memory workflows. - Add
ParallelForBlockand aContext-awareMakeFixedVecWithMallocinref_resource_view, using block-parallel initialization, and wire GPU-side code to the newResizeIndex(ctx, ...)API.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/cpp/tree/test_quantile_hist.cc |
Adjusts quantile histogram partitioner tests to use the new GHistIndexMatrix constructor taking Context const* instead of a raw thread count. |
tests/cpp/tree/hist/test_histogram.cc |
Updates external-memory histogram test to construct GHistIndexMatrix with Context const*, matching the new API. |
tests/cpp/data/test_sparse_page_dmatrix.cc |
Updates gradient index external-memory tests to use the Context-aware GHistIndexMatrix constructor. |
src/data/sparse_page_dmatrix.cc |
Refactors GetGradientIndex to construct GradientIndexPageSource with a Context const*, so downstream gradient-index construction can use ctx->Threads(). |
src/data/gradient_index_page_source.h |
Changes GradientIndexPageSource to accept Context const* and pass ctx->Threads() into PageSourceIncMixIn, preparing for Context-driven CPU threading. |
src/data/gradient_index_page_source.cc |
In GradientIndexPageSource::Fetch, constructs a local Context with nthreads_ and uses it to build GHistIndexMatrix with the new (Context const*, SparsePage const&, ...) constructor. |
src/data/gradient_index.h |
Refactors GHistIndexMatrix API: PushBatch and PushBatchImpl now take Context const*, the external-memory constructor is updated to take Context const*, and ResizeIndex now also receives Context const*. |
src/data/gradient_index.cu |
Updates the GPU-side constructor to call ResizeIndex(ctx, ...), aligning with the new CPU/GPU-agnostic signature. |
src/data/gradient_index.cc |
Uses MakeFixedVecWithMalloc(ctx, ...) for hit_count and row_ptr, switches PushBatch and PushAdapterBatch to the Context-aware versions, and updates ResizeIndex to allocate using the new MakeFixedVecWithMalloc(ctx, ...). |
src/data/gradient_index_page_source.h |
(Same as above file) Ensures CPU gradient-index page source derives its thread count from Context instead of a bare nthreads parameter. |
src/common/threading_utils.h |
Adds ParallelForBlock intended to use n_threads as block count, but the current implementation can construct invalid Range1d and out-of-bounds ranges when size <= n_threads and ignores the clamped end variable. |
src/common/ref_resource_view.h |
Adds a Context-aware MakeFixedVecWithMalloc that uses ParallelForBlock to parallelize initialization of RefResourceView storage, and wires in the necessary includes for Context and threading utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise. |
yes, I will take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
razdoburdin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for me
| max_numeric_bins_per_feat{max_bins_per_feat}, | ||
| base_rowid{batch.base_rowid}, | ||
| isDense_{is_dense} { | ||
| CHECK_GE(n_threads, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the n_threads >= 1 checked elseware ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the number of threads comes from the Context class, it should go through this check:
xgboost/src/common/threading_utils.cc
Line 120 in f5f1e65
| n_threads = std::max(n_threads, 1); |
src/common/threading_utils.h
Outdated
| std::size_t blk_size = size / n_threads + (size % n_threads > 0); | ||
| ParallelFor(n_threads, n_threads, [&](auto tid) { | ||
| auto blk_beg = tid * blk_size; | ||
| if (blk_beg >= size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check looks redundant. If (size == 0), blk_size == blk_beg == end == 0, so the (end == blk_beg) check shall work in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not for size==0. Say size==1, but n_threads==8, then the block_size=1, the second block (thread) would have block_begin = 1 * 1 = 1, end would be end = 2 * 1 = 2, and block_begin != end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
May be just replace if (end == blk_beg) by if (end <= blk_beg)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, done.
4f6f74b to
9b02b0f
Compare
|
Haven't done this very often. I made sure the only commit here has @razdoburdin as the author, I hope squashing and merging the commit with github will retain the information. |
|
No.. the author is switched. |
Extracted from #11390 , modified to use the context instead. In addition, a new
ParallelForBlockis created.