Enable threaded data initialization for CPU. #11974

trivialfis · 2026-01-29T11:07:18Z

Extracted from #11390 , modified to use the context instead. In addition, a new ParallelForBlock is created.

Copilot

Pull request overview

This PR refactors gradient index construction on CPU to consistently use Context-based threading and introduces a new block-based parallel helper for initializing reference-counted buffers.

Changes:

Update GHistIndexMatrix constructors and internal methods to take Context and use new MakeFixedVecWithMalloc(ctx, ...) for multi-threaded, malloc-backed buffers on CPU (and hook up corresponding call sites and tests).
Refactor gradient-index page sources (SparsePageDMatrix::GetGradientIndex and GradientIndexPageSource) to accept a Context and construct GHistIndexMatrix with it, including for external-memory workflows.
Add ParallelForBlock and a Context-aware MakeFixedVecWithMalloc in ref_resource_view, using block-parallel initialization, and wire GPU-side code to the new ResizeIndex(ctx, ...) API.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`tests/cpp/tree/test_quantile_hist.cc`	Adjusts quantile histogram partitioner tests to use the new `GHistIndexMatrix` constructor taking `Context const*` instead of a raw thread count.
`tests/cpp/tree/hist/test_histogram.cc`	Updates external-memory histogram test to construct `GHistIndexMatrix` with `Context const*`, matching the new API.
`tests/cpp/data/test_sparse_page_dmatrix.cc`	Updates gradient index external-memory tests to use the `Context`-aware `GHistIndexMatrix` constructor.
`src/data/sparse_page_dmatrix.cc`	Refactors `GetGradientIndex` to construct `GradientIndexPageSource` with a `Context const*`, so downstream gradient-index construction can use `ctx->Threads()`.
`src/data/gradient_index_page_source.h`	Changes `GradientIndexPageSource` to accept `Context const*` and pass `ctx->Threads()` into `PageSourceIncMixIn`, preparing for `Context`-driven CPU threading.
`src/data/gradient_index_page_source.cc`	In `GradientIndexPageSource::Fetch`, constructs a local `Context` with `nthreads_` and uses it to build `GHistIndexMatrix` with the new `(Context const*, SparsePage const&, ...)` constructor.
`src/data/gradient_index.h`	Refactors `GHistIndexMatrix` API: `PushBatch` and `PushBatchImpl` now take `Context const`, the external-memory constructor is updated to take `Context const`, and `ResizeIndex` now also receives `Context const*`.
`src/data/gradient_index.cu`	Updates the GPU-side constructor to call `ResizeIndex(ctx, ...)`, aligning with the new CPU/GPU-agnostic signature.
`src/data/gradient_index.cc`	Uses `MakeFixedVecWithMalloc(ctx, ...)` for `hit_count` and `row_ptr`, switches `PushBatch` and `PushAdapterBatch` to the `Context`-aware versions, and updates `ResizeIndex` to allocate using the new `MakeFixedVecWithMalloc(ctx, ...)`.
`src/data/gradient_index_page_source.h`	(Same as above file) Ensures CPU gradient-index page source derives its thread count from `Context` instead of a bare `nthreads` parameter.
`src/common/threading_utils.h`	Adds `ParallelForBlock` intended to use `n_threads` as block count, but the current implementation can construct invalid `Range1d` and out-of-bounds ranges when `size <= n_threads` and ignores the clamped `end` variable.
`src/common/ref_resource_view.h`	Adds a `Context`-aware `MakeFixedVecWithMalloc` that uses `ParallelForBlock` to parallelize initialization of `RefResourceView` storage, and wires in the necessary includes for `Context` and threading utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/common/threading_utils.h

trivialfis · 2026-01-29T11:49:11Z

@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise.

razdoburdin · 2026-01-29T11:53:33Z

@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise.

yes, I will take a look

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

razdoburdin

looks good for me

razdoburdin · 2026-02-02T09:04:34Z

src/data/gradient_index.cc

      max_numeric_bins_per_feat{max_bins_per_feat},
      base_rowid{batch.base_rowid},
      isDense_{is_dense} {
-  CHECK_GE(n_threads, 1);


Is the n_threads >= 1 checked elseware ?

As long as the number of threads comes from the Context class, it should go through this check:

xgboost/src/common/threading_utils.cc

Line 120 in f5f1e65

n_threads = std::max(n_threads, 1);

razdoburdin · 2026-02-02T09:15:28Z

src/common/threading_utils.h

+  std::size_t blk_size = size / n_threads + (size % n_threads > 0);
+  ParallelFor(n_threads, n_threads, [&](auto tid) {
+    auto blk_beg = tid * blk_size;
+    if (blk_beg >= size) {


This check looks redundant. If (size == 0), blk_size == blk_beg == end == 0, so the (end == blk_beg) check shall work in this case.

It's not for size==0. Say size==1, but n_threads==8, then the block_size=1, the second block (thread) would have block_begin = 1 * 1 = 1, end would be end = 2 * 1 = 2, and block_begin != end.

Ah, I see.
May be just replace if (end == blk_beg) by if (end <= blk_beg)?

Thank you for the suggestion, done.

trivialfis · 2026-02-03T14:15:33Z

Haven't done this very often. I made sure the only commit here has @razdoburdin as the author, I hope squashing and merging the commit with github will retain the information.

trivialfis · 2026-02-03T17:15:04Z

No.. the author is switched.

trivialfis mentioned this pull request Jan 29, 2026

Optimization of data initialization for large sparce datasets #11390

Open

trivialfis requested a review from Copilot January 29, 2026 11:09

Copilot started reviewing on behalf of trivialfis January 29, 2026 11:09 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

src/common/threading_utils.h Outdated Show resolved Hide resolved

trivialfis requested a review from Copilot January 29, 2026 11:27

Copilot started reviewing on behalf of trivialfis January 29, 2026 11:28 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

razdoburdin reviewed Feb 2, 2026

View reviewed changes

Enable multi-thread for large data initialization.

9b02b0f

trivialfis force-pushed the cpu-opt-multi-thread-init branch from 4f6f74b to 9b02b0f Compare February 3, 2026 13:57

trivialfis merged commit ad79b37 into dmlc:master Feb 3, 2026
76 checks passed

trivialfis deleted the cpu-opt-multi-thread-init branch February 3, 2026 17:14

Uh oh!

Enable threaded data initialization for CPU. #11974

Enable threaded data initialization for CPU. #11974

Conversation

trivialfis commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

trivialfis commented Jan 29, 2026

Uh oh!

razdoburdin commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

razdoburdin left a comment

Choose a reason for hiding this comment

Uh oh!

razdoburdin Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

trivialfis Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

razdoburdin Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

trivialfis Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

razdoburdin Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

trivialfis Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

trivialfis commented Feb 3, 2026

Uh oh!

Uh oh!

trivialfis commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trivialfis commented Jan 29, 2026 •

edited

Loading

trivialfis Feb 3, 2026 •

edited

Loading