Conversation
`CandidateRankingModel`
We make changes to the `get_train_with_targets_for_reranker` method to separate the retrieval of sampled candidates and unsampled candidates from first-stage candidate generators for the reranker.
…eSystems/RecTools into experimental/two_stage
There was a problem hiding this comment.
Pull request overview
This PR introduces a two-stage recommendation pipeline through the CandidateRankingModel class, which combines first-stage candidate generation with second-stage reranking using gradient boosting models.
Key Changes:
- Implements a flexible two-stage ranking architecture with support for multiple candidate generators and various reranking models
- Adds specialized support for CatBoost models through the
CatBoostRerankerclass - Introduces helper classes for feature collection, negative sampling, and candidate generation
- Includes comprehensive tests and a detailed tutorial notebook
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
rectools/models/ranking/candidate_ranking.py |
Core implementation of the two-stage ranking model with candidate generation, feature collection, and reranking logic |
rectools/models/ranking/catboost_reranker.py |
Specialized reranker for CatBoost classifiers and rankers with pool preparation |
rectools/models/ranking/__init__.py |
Module exports with fallback imports for optional dependencies |
rectools/exceptions.py |
New NotFittedForStageError exception for stage-specific fitting requirements |
rectools/columns.py |
Added Target column constant for train/test target values |
rectools/compat.py |
Compatibility class for CatBoost when dependency is unavailable |
tests/models/ranking/test_candidate_ranking.py |
Comprehensive tests for all ranking components |
tests/models/ranking/test_catboost_reranker.py |
Tests for CatBoost-specific functionality |
tests/models/test_serialization.py |
Model serialization tests including CandidateRankingModel |
tests/test_compat.py |
Compatibility layer tests for CatBoostReranker |
pyproject.toml |
Added catboost dependency and updated black version |
README.md |
Documentation of new catboost extension |
examples/tutorials/candidate_ranking_model_tutorial.ipynb |
Detailed tutorial with multiple reranker examples |
.github/workflows/test.yml |
Removed trailing whitespace |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #296 +/- ##
===========================================
Coverage 100.00% 100.00%
===========================================
Files 45 85 +40
Lines 2242 5870 +3628
===========================================
+ Hits 2242 5870 +3628 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| A series containing the predicted scores for each candidate. If the model is a classifier, the scores | ||
| represent probabilities for the positive class. | ||
| """ | ||
| x_full = candidates.drop(columns=Columns.UserItem) |
There was a problem hiding this comment.
Are we sure we always want to remove user and item ids?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 18 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| splitter: Splitter, | ||
| reranker: Reranker, | ||
| sampler: NegativeSamplerBase = PerUserNegativeSampler(), | ||
| feature_collector: CandidateFeatureCollector = CandidateFeatureCollector(), |
There was a problem hiding this comment.
These defaults instantiate objects at function definition time (PerUserNegativeSampler() / CandidateFeatureCollector()), which can unintentionally share state across CandidateRankingModel instances and is flagged by many linters. Prefer sampler: Optional[...] = None / feature_collector: Optional[...] = None and create instances inside __init__.
| train_targets[Columns.Target] = 1 | ||
|
|
||
| # Remember that this way we exclude positives that weren't present in candidates | ||
| train = pd.merge( | ||
| candidates, | ||
| train_targets[[Columns.User, Columns.Item, Columns.Target]], |
There was a problem hiding this comment.
train_targets[Columns.Target] = 1 mutates the train_targets DataFrame passed into this method and can also trigger SettingWithCopyWarning depending on how the slice was created. Prefer working on a copy (e.g., train_targets = train_targets.copy()), or create a separate [user,item,target] frame to merge without mutating the input.
| train_targets[Columns.Target] = 1 | |
| # Remember that this way we exclude positives that weren't present in candidates | |
| train = pd.merge( | |
| candidates, | |
| train_targets[[Columns.User, Columns.Item, Columns.Target]], | |
| # Work on a separate DataFrame to avoid mutating the input and to prevent SettingWithCopyWarning | |
| targets = train_targets[[Columns.User, Columns.Item]].copy() | |
| targets[Columns.Target] = 1 | |
| # Remember that this way we exclude positives that weren't present in candidates | |
| train = pd.merge( | |
| candidates, | |
| targets[[Columns.User, Columns.Item, Columns.Target]], |
There was a problem hiding this comment.
fixed using indicator
Description
Type of change
How Has This Been Tested?
Before submitting a PR, please check yourself against the following list. It would save us quite a lot of time.