Optimize Scikit-learn model loading by adding Bulk Tree Construction API by dantegd · Pull Request #651 · dmlc/treelite

dantegd · 2025-12-19T20:10:47Z

This PR introduces a bulk tree construction API that significantly improves performance when importing scikit-learn RandomForest models into Treelite. In my benchmarks, the new API achieves ~7-10x speedup over the existing node-by-node construction approach of the current sklearn loader.

The current implementation spends significant time in per-node overhead due to:

Repeated ModelBuilder method calls for each node
Python-C++ boundary crossing overhead accumulating across millions of nodes
Memory allocation patterns that don't benefit from bulk operations

This becomes a bottleneck in workflows like cuML's RandomForestClassifier.from_sklearn(), where treelite import time dominates the conversion process.

This PR implements a BulkConstructTree friend function that directly populates the Tree class's internal ContiguousArray members in a single pass, bypassing the ModelBuilder abstraction for sklearn imports.

Initial benchmarks:

Configuration	Total Nodes	Old API (ms)	Bulk API (ms)	Speedup
classifier, 50 trees, depth=10	39,844	13.5	1.8	7.45x
classifier, 100 trees, depth=15	351,826	77.3	10.2	7.54x
classifier, 300 trees, depth=20	2,520,062	544.9	60.7	8.98x
regressor, 100 trees, depth=15	978,436	195.6	18.8	10.42x

codecov · 2026-01-07T00:20:55Z

Codecov Report

❌ Patch coverage is 99.29329% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.73%. Comparing base (a9ce6c3) to head (0c46c84).

Files with missing lines	Patch %	Lines
python/treelite/sklearn/importer.py	71.42%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           mainline     #651      +/-   ##
============================================
- Coverage     84.35%   83.73%   -0.63%     
============================================
  Files            75       76       +1     
  Lines          6653     6927     +274     
  Branches        543      561      +18     
============================================
+ Hits           5612     5800     +188     
- Misses         1041     1127      +86

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hcho3

Can we go ahead and simply remove the old sklearn model builder functions? So LoadSKLearnRandomForestRegressorBulk should be simply called LoadSKLearnRandomForestRegressor, etc.

I don't see a good reason to keep the old functions around, if the new functions are equivalent in functionalities but faster.

FEA Add new optimized scikit-learn loading functionality

6fd15d8

betatim mentioned this pull request Jan 6, 2026

Unpickling RandomForestClassifier with cuml.accel rapidsai/cuml#7627

Open

ENH Optimize serialization with pre-allocated buffer

0c46c84

dantegd marked this pull request as ready for review January 6, 2026 18:40

hcho3 requested changes Jan 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651
dantegd wants to merge 2 commits intodmlc:mainlinefrom
dantegd:optimize-sklearn-loader

dantegd commented Dec 19, 2025

Uh oh!

codecov bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

hcho3 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dantegd commented Dec 19, 2025

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hcho3 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 7, 2026 •

edited

Loading

hcho3 left a comment •

edited

Loading