Experimental/two stage by feldlime · Pull Request #296 · MobileTeleSystems/RecTools

feldlime · 2025-08-30T22:09:59Z

Description

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Optimization

How Has This Been Tested?

Before submitting a PR, please check yourself against the following list. It would save us quite a lot of time.

Have you read the contribution guide?
Have you updated the relevant docstrings? We're using Numpy format, please double-check yourself
Does your change require any new tests?
Have you updated the changelog file?

`CandidateRankingModel`

We make changes to the `get_train_with_targets_for_reranker` method to separate the retrieval of sampled candidates and unsampled candidates from first-stage candidate generators for the reranker.

…eSystems/RecTools into experimental/two_stage

Copilot

Pull request overview

This PR introduces a two-stage recommendation pipeline through the CandidateRankingModel class, which combines first-stage candidate generation with second-stage reranking using gradient boosting models.

Key Changes:

Implements a flexible two-stage ranking architecture with support for multiple candidate generators and various reranking models
Adds specialized support for CatBoost models through the CatBoostReranker class
Introduces helper classes for feature collection, negative sampling, and candidate generation
Includes comprehensive tests and a detailed tutorial notebook

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`rectools/models/ranking/candidate_ranking.py`	Core implementation of the two-stage ranking model with candidate generation, feature collection, and reranking logic
`rectools/models/ranking/catboost_reranker.py`	Specialized reranker for CatBoost classifiers and rankers with pool preparation
`rectools/models/ranking/__init__.py`	Module exports with fallback imports for optional dependencies
`rectools/exceptions.py`	New `NotFittedForStageError` exception for stage-specific fitting requirements
`rectools/columns.py`	Added `Target` column constant for train/test target values
`rectools/compat.py`	Compatibility class for CatBoost when dependency is unavailable
`tests/models/ranking/test_candidate_ranking.py`	Comprehensive tests for all ranking components
`tests/models/ranking/test_catboost_reranker.py`	Tests for CatBoost-specific functionality
`tests/models/test_serialization.py`	Model serialization tests including CandidateRankingModel
`tests/test_compat.py`	Compatibility layer tests for CatBoostReranker
`pyproject.toml`	Added catboost dependency and updated black version
`README.md`	Documentation of new catboost extension
`examples/tutorials/candidate_ranking_model_tutorial.ipynb`	Detailed tutorial with multiple reranker examples
`.github/workflows/test.yml`	Removed trailing whitespace

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rectools/models/ranking/candidate_ranking.py

rectools/exceptions.py

rectools/models/ranking/candidate_ranking.py

tests/models/ranking/test_catboost_reranker.py

rectools/models/ranking/candidate_ranking.py

rectools/models/ranking/catboost_reranker.py

codecov · 2026-01-31T15:15:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (9b3992e) to head (8d8aeed).
⚠️ Report is 122 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##              main      #296     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           45        85     +40     
  Lines         2242      5870   +3628     
===========================================
+ Hits          2242      5870   +3628

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

feldlime · 2025-09-15T20:21:29Z

rectools/models/ranking/candidate_ranking.py

+            A series containing the predicted scores for each candidate. If the model is a classifier, the scores
+            represent probabilities for the positive class.
+        """
+        x_full = candidates.drop(columns=Columns.UserItem)


Are we sure we always want to remove user and item ids?

@blondered My old question to you

rectools/exceptions.py

rectools/models/ranking/candidate_ranking.py

tests/models/ranking/test_catboost_reranker.py

rectools/models/ranking/catboost_reranker.py

Copilot

Pull request overview

Copilot reviewed 15 out of 18 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-01T17:18:54Z

rectools/models/ranking/candidate_ranking.py

+        splitter: Splitter,
+        reranker: Reranker,
+        sampler: NegativeSamplerBase = PerUserNegativeSampler(),
+        feature_collector: CandidateFeatureCollector = CandidateFeatureCollector(),


These defaults instantiate objects at function definition time (PerUserNegativeSampler() / CandidateFeatureCollector()), which can unintentionally share state across CandidateRankingModel instances and is flagged by many linters. Prefer sampler: Optional[...] = None / feature_collector: Optional[...] = None and create instances inside __init__.

@blondered I actually agree. WDYT?

Copilot · 2026-02-01T17:18:55Z

rectools/models/ranking/candidate_ranking.py

+        train_targets[Columns.Target] = 1
+
+        # Remember that this way we exclude positives that weren't present in candidates
+        train = pd.merge(
+            candidates,
+            train_targets[[Columns.User, Columns.Item, Columns.Target]],


train_targets[Columns.Target] = 1 mutates the train_targets DataFrame passed into this method and can also trigger SettingWithCopyWarning depending on how the slice was created. Prefer working on a copy (e.g., train_targets = train_targets.copy()), or create a separate [user,item,target] frame to merge without mutating the input.

Suggested change

train_targets[Columns.Target] = 1

# Remember that this way we exclude positives that weren't present in candidates

train = pd.merge(

candidates,

train_targets[[Columns.User, Columns.Item, Columns.Target]],

# Work on a separate DataFrame to avoid mutating the input and to prevent SettingWithCopyWarning

targets = train_targets[[Columns.User, Columns.Item]].copy()

targets[Columns.Target] = 1

# Remember that this way we exclude positives that weren't present in candidates

train = pd.merge(

candidates,

targets[[Columns.User, Columns.Item, Columns.Target]],

fixed using indicator

rectools/models/ranking/candidate_ranking.py

tests/models/ranking/test_candidate_ranking.py

tests/models/ranking/test_catboost_reranker.py

README.md

rectools/models/ranking/candidate_ranking.py

rectools/models/ranking/catboost_reranker.py

blondered and others added 15 commits December 24, 2024 17:32

Feature/twostage pandas (#234)

55a3b91

`CandidateRankingModel`

Fix get_train_with_targets_for_reranker method (#244)

0bbc87e

We make changes to the `get_train_with_targets_for_reranker` method to separate the retrieval of sampled candidates and unsampled candidates from first-stage candidate generators for the reranker.

Merge branch 'main' into experimental/two_stage

c710fc0

fixed pyproject.toml

07dd084

fixed import

1ee1b16

removed unused function

d8a5716

bumped black version

19b6f02

removed duplicated method

3e33aec

Merge branch 'main' into experimental/two_stage

bc9f6f9

fixed comments

2c80fc4

improved error handling

47f25c1

Merge branch 'experimental/two_stage' of https://github.com/MobileTel…

852c320

…eSystems/RecTools into experimental/two_stage

fixed errors and warnings

80256c5

added ipykernel dependancy

05604f0

adjusted tutorial

c8faada

feldlime requested a review from Copilot December 7, 2025 23:06

Copilot started reviewing on behalf of feldlime December 7, 2025 23:06 View session

small improvements in the tutorial

aef3135

Copilot AI reviewed Dec 7, 2025

View reviewed changes

feldlime added 2 commits December 8, 2025 08:49

small fixes

a95299f

restricted lightning version

a8b1480

feldlime commented Feb 1, 2026

View reviewed changes

feldlime added 4 commits February 1, 2026 15:49

added catboost_info to gitignore

e72b32a

small fixes

7f9848b

improved coverage

b16cb63

formatted

8d8aeed

feldlime marked this pull request as ready for review February 1, 2026 17:09

feldlime requested review from blondered and Copilot February 1, 2026 17:09

Copilot started reviewing on behalf of feldlime February 1, 2026 17:09 View session

Copilot AI reviewed Feb 1, 2026

View reviewed changes

feldlime added 2 commits February 2, 2026 09:19

updated changelog

263ef4e

small copilot review fixes

16a2908

Conversation

feldlime commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

feldlime Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

feldlime Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

feldlime Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

feldlime Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feldlime commented Aug 30, 2025 •

edited

Loading

codecov bot commented Jan 31, 2026 •

edited

Loading