Skip to content

Conversation

@trivialfis
Copy link
Member

@trivialfis trivialfis commented Jan 29, 2026

Extracted from #11390 , modified to use the context instead. In addition, a new ParallelForBlock is created.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors gradient index construction on CPU to consistently use Context-based threading and introduces a new block-based parallel helper for initializing reference-counted buffers.

Changes:

  • Update GHistIndexMatrix constructors and internal methods to take Context and use new MakeFixedVecWithMalloc(ctx, ...) for multi-threaded, malloc-backed buffers on CPU (and hook up corresponding call sites and tests).
  • Refactor gradient-index page sources (SparsePageDMatrix::GetGradientIndex and GradientIndexPageSource) to accept a Context and construct GHistIndexMatrix with it, including for external-memory workflows.
  • Add ParallelForBlock and a Context-aware MakeFixedVecWithMalloc in ref_resource_view, using block-parallel initialization, and wire GPU-side code to the new ResizeIndex(ctx, ...) API.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/cpp/tree/test_quantile_hist.cc Adjusts quantile histogram partitioner tests to use the new GHistIndexMatrix constructor taking Context const* instead of a raw thread count.
tests/cpp/tree/hist/test_histogram.cc Updates external-memory histogram test to construct GHistIndexMatrix with Context const*, matching the new API.
tests/cpp/data/test_sparse_page_dmatrix.cc Updates gradient index external-memory tests to use the Context-aware GHistIndexMatrix constructor.
src/data/sparse_page_dmatrix.cc Refactors GetGradientIndex to construct GradientIndexPageSource with a Context const*, so downstream gradient-index construction can use ctx->Threads().
src/data/gradient_index_page_source.h Changes GradientIndexPageSource to accept Context const* and pass ctx->Threads() into PageSourceIncMixIn, preparing for Context-driven CPU threading.
src/data/gradient_index_page_source.cc In GradientIndexPageSource::Fetch, constructs a local Context with nthreads_ and uses it to build GHistIndexMatrix with the new (Context const*, SparsePage const&, ...) constructor.
src/data/gradient_index.h Refactors GHistIndexMatrix API: PushBatch and PushBatchImpl now take Context const*, the external-memory constructor is updated to take Context const*, and ResizeIndex now also receives Context const*.
src/data/gradient_index.cu Updates the GPU-side constructor to call ResizeIndex(ctx, ...), aligning with the new CPU/GPU-agnostic signature.
src/data/gradient_index.cc Uses MakeFixedVecWithMalloc(ctx, ...) for hit_count and row_ptr, switches PushBatch and PushAdapterBatch to the Context-aware versions, and updates ResizeIndex to allocate using the new MakeFixedVecWithMalloc(ctx, ...).
src/data/gradient_index_page_source.h (Same as above file) Ensures CPU gradient-index page source derives its thread count from Context instead of a bare nthreads parameter.
src/common/threading_utils.h Adds ParallelForBlock intended to use n_threads as block count, but the current implementation can construct invalid Range1d and out-of-bounds ranges when size <= n_threads and ignores the clamped end variable.
src/common/ref_resource_view.h Adds a Context-aware MakeFixedVecWithMalloc that uses ParallelForBlock to parallelize initialization of RefResourceView storage, and wires in the necessary includes for Context and threading utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@trivialfis
Copy link
Member Author

@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise.

@razdoburdin
Copy link
Contributor

@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise.

yes, I will take a look

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

@razdoburdin razdoburdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for me

max_numeric_bins_per_feat{max_bins_per_feat},
base_rowid{batch.base_rowid},
isDense_{is_dense} {
CHECK_GE(n_threads, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the n_threads >= 1 checked elseware ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the number of threads comes from the Context class, it should go through this check:

n_threads = std::max(n_threads, 1);

std::size_t blk_size = size / n_threads + (size % n_threads > 0);
ParallelFor(n_threads, n_threads, [&](auto tid) {
auto blk_beg = tid * blk_size;
if (blk_beg >= size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check looks redundant. If (size == 0), blk_size == blk_beg == end == 0, so the (end == blk_beg) check shall work in this case.

Copy link
Member Author

@trivialfis trivialfis Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not for size==0. Say size==1, but n_threads==8, then the block_size=1, the second block (thread) would have block_begin = 1 * 1 = 1, end would be end = 2 * 1 = 2, and block_begin != end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.
May be just replace if (end == blk_beg) by if (end <= blk_beg)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, done.

@trivialfis trivialfis force-pushed the cpu-opt-multi-thread-init branch from 4f6f74b to 9b02b0f Compare February 3, 2026 13:57
@trivialfis
Copy link
Member Author

Haven't done this very often. I made sure the only commit here has @razdoburdin as the author, I hope squashing and merging the commit with github will retain the information.

@trivialfis trivialfis merged commit ad79b37 into dmlc:master Feb 3, 2026
76 checks passed
@trivialfis trivialfis deleted the cpu-opt-multi-thread-init branch February 3, 2026 17:14
@trivialfis
Copy link
Member Author

No.. the author is switched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants