Remove redundant learning curve implementation paths (#963)

bruAristimunha · Copilot · web-flow · commit e97f926b7f8a · 2026-02-04T23:52:42.000+01:00
* Remove redundant learning curve implementation paths (#962) * Initial plan * Remove redundant learning curve implementation - Remove data_size and n_perms parameters from WithinSessionEvaluation - Remove get_data_size_subsets(), score_explicit(), and _evaluate_learning_curve() methods - Simplify evaluate() to always use _evaluate() path - Update docstring to recommend cv_class=LearningCurveSplitter - Update all examples to use new LearningCurveSplitter API - Update tests to use new API Co-authored-by: bruAristimunha <42702466+bruAristimunha@users.noreply.github.com> * Fix pre-commit linting issues - Apply black formatting (line length 90) - Fix import sorting with isort - Remove unused imports (Optional, StratifiedShuffleSplit) - Apply ruff fixes Co-authored-by: bruAristimunha <42702466+bruAristimunha@users.noreply.github.com> * Apply final black formatting fixes - Remove extra blank lines per black style guide Co-authored-by: bruAristimunha <42702466+bruAristimunha@users.noreply.github.com> * Fix isort import ordering - Apply isort to properly order imports in all changed files Co-authored-by: bruAristimunha <42702466+bruAristimunha@users.noreply.github.com> * Resolve black/isort formatting conflict - Apply black formatting after isort - Remove extra blank line after imports * Add single-class safeguard for LearningCurveSplitter - Skip splits where training set collapses to single class - Log warning when splits are skipped due to single-class training sets - Fix ArrowStringArray shuffle warnings by converting to numpy arrays - Update tests to call process() since validation happens at evaluation time - Fix isort import ordering in learning curve examples --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bruAristimunha <42702466+bruAristimunha@users.noreply.github.com> Co-authored-by: Bru <b.aristimunha@gmail.com> * Add whats_new entries for learning curve unification Document the learning curve and splitter improvements: - cv_class and cv_kwargs parameters for all evaluation classes - LearningCurveSplitter for sklearn-compatible learning curves - Removal of data_size and n_perms from WithinSessionEvaluation - Automatic metadata columns for learning curve results - Centralized CV resolution with _resolve_cv() method - Removal of redundant learning curve methods * Add parametrized test for LearningCurveSplitter as cv_class Test that LearningCurveSplitter can be used as cv_class parameter for all main splitters: WithinSessionSplitter, WithinSubjectSplitter, CrossSessionSplitter, and CrossSubjectSplitter. * updating the python files * solving problem with new pandas * updating the splits to make sure about the logic * iteration * Simplify the logic * solving the group problem * iteration 2 * simplify and simplify * Update whats_new.rst with _load_data, _get_nchan, and splitter hoisting Document the extraction of _load_data() and _get_nchan() helpers into BaseEvaluation, the move of _pipeline_requires_epochs() to utils.py, and the WithinSessionSplitter creation hoisted outside the session loop. --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
diff --git a/docs/source/whats_new.rst b/docs/source/whats_new.rst
@@ -32,6 +32,8 @@ Enhancements
 - Ability to parameterize the scoring rule of paradigms (:gh:`948` by `Ethan Davis`_)
 - Extend scoring configuration to accept lists of metric callables, scorer objects, and tuple kwargs (e.g., `needs_proba`/`needs_threshold`) for multi-metric evaluations (:gh:`948` by `Ethan Davis`_ and `Bruno Aristimunha`_)
 - Implement :class:`moabb.evaluations.WithinSubjectSplitter` for k-fold cross-validation within each subject across all sessions (by `Bruno Aristimunha`_)
+- Add ``cv_class`` and ``cv_kwargs`` parameters to all evaluation classes (WithinSessionEvaluation, CrossSessionEvaluation, CrossSubjectEvaluation) for custom cross-validation strategies (:gh:`963` by `Bruno Aristimunha`_)
+- Implement :class:`moabb.evaluations.splitters.LearningCurveSplitter` as a dedicated sklearn-compatible cross-validator for learning curves, enabling learning curve analysis with any evaluation type (:gh:`963` by `Bruno Aristimunha`_)
 
 API changes
 ~~~~~~~~~~~
@@ -42,6 +44,8 @@ API changes
 - Enable choice of online or offline CodeCarbon through the parameterization of `codecarbon_config` when instantiating a :class:`moabb.evaluations.base.BaseEvaluation` child class (:gh:`956` by `Ethan Davis`_)
 - Renamed stimulus channel from ``stim`` to ``STI`` in BNCI motor imagery and error-related potential datasets for clarity and BIDS compliance (by `Bruno Aristimunha`_).
 - Added four new BNCI P300/ERP dataset classes: :class:`moabb.datasets.BNCI2015_009` (AMUSE), :class:`moabb.datasets.BNCI2015_010` (RSVP), :class:`moabb.datasets.BNCI2015_012` (PASS2D), and :class:`moabb.datasets.BNCI2015_013` (ErrP) (by `Bruno Aristimunha`_).
+- Removed ``data_size`` and ``n_perms`` parameters from :class:`moabb.evaluations.WithinSessionEvaluation`. Use ``cv_class=LearningCurveSplitter`` with ``cv_kwargs=dict(data_size=..., n_perms=...)`` instead (:gh:`963` by `Bruno Aristimunha`_)
+- Learning curve results now automatically include "data_size" and "permutation" columns when using ``LearningCurveSplitter`` (:gh:`963` by `Bruno Aristimunha`_)
 
 Requirements
 ~~~~~~~~~~~~
@@ -61,6 +65,8 @@ Bugs
 - Prevent Python mutable default argument when defining CodeCarbon configurations (:gh:`956` by `Ethan Davis`_)
 - Fix copytree FileExistsError in BrainInvaders2013a download by adding dirs_exist_ok=True (by `Bruno Aristimunha`_)
 - Ensure optional additional scoring columns in evaluation results (:gh:`957` by `Ethan Davis`_)
+- Fix pandas ``ArrowStringArray`` shuffle warning by converting ``.unique()`` results to numpy arrays in splitters, avoiding issues with newer pandas versions (:gh:`963` by `Bruno Aristimunha`_)
+- ``LearningCurveSplitter`` now skips training splits that collapse to a single class (e.g., with very small ``data_size``) and emits a ``RuntimeWarning`` instead of producing NaN results (:gh:`963` by `Bruno Aristimunha`_)
 
 Code health
 ~~~~~~~~~~~
@@ -69,6 +75,16 @@ Code health
 
 - Persist docs/test CI MNE dataset cache across runs to reduce cold-cache downloads (:gh:`946` by `Bruno Aristimunha`_)
 - Refactor evaluation scoring into shared utility functions for future improvements (:gh:`948` by `Bruno Aristimunha`_)
+- Centralize CV resolution in BaseEvaluation with new ``_resolve_cv()`` method for consistent cross-validation handling across all evaluation types. Add ``_build_result()`` and ``_build_scored_result()`` helpers to centralize result dict construction across WithinSession, CrossSession, and CrossSubject evaluations, replacing manual dict assembly in each (:gh:`963` by `Bruno Aristimunha`_)
+- Remove redundant learning curve methods (``get_data_size_subsets()``, ``score_explicit()``, ``_evaluate_learning_curve()``) from WithinSessionEvaluation in favor of unified splitter-based approach (:gh:`963` by `Bruno Aristimunha`_)
+- Generic metadata column registration: ``LearningCurveSplitter`` declares a ``metadata_columns`` class attribute, and ``BaseEvaluation`` auto-detects it via ``hasattr(cv_class, "metadata_columns")`` instead of hardcoding class checks, making it extensible to future custom splitters (:gh:`963` by `Bruno Aristimunha`_)
+- Fix ``get_n_splits()`` delegation in ``WithinSessionSplitter`` and ``WithinSubjectSplitter`` to properly forward to the inner ``cv_class.get_n_splits()`` instead of hardcoding ``n_folds``, giving correct split counts when using custom CV classes like ``LearningCurveSplitter`` (:gh:`963` by `Bruno Aristimunha`_)
+- Remove duplicate ``get_inner_splitter_metadata()`` from ``WithinSessionSplitter``, ``WithinSubjectSplitter``, and ``CrossSubjectSplitter``. All splitters now store a ``_current_splitter`` reference, and ``BaseEvaluation._build_scored_result()`` reads metadata generically from it (:gh:`963` by `Bruno Aristimunha`_)
+- Extract ``_fit_cv()``, ``_maybe_save_model_cv()``, and ``_attach_emissions()`` into ``BaseEvaluation``, removing duplicated model-fitting, model-saving, and carbon-tracking boilerplate from ``WithinSessionEvaluation``, ``CrossSessionEvaluation``, and ``CrossSubjectEvaluation`` (:gh:`963` by `Bruno Aristimunha`_)
+- Extract ``_load_data()`` helper into ``BaseEvaluation`` to centralize data loading logic (epoch requirement checking and ``paradigm.get_data()`` call) that was duplicated across all three evaluation classes (:gh:`963` by `Bruno Aristimunha`_)
+- Extract ``_get_nchan()`` helper into ``BaseEvaluation`` to replace repeated channel count extraction (``X.info["nchan"] if isinstance(X, BaseEpochs) else X.shape[1]``) in all evaluation classes (:gh:`963` by `Bruno Aristimunha`_)
+- Move ``_pipeline_requires_epochs()`` from ``evaluations.py`` to ``utils.py`` for shared access by ``BaseEvaluation._load_data()`` (:gh:`963` by `Bruno Aristimunha`_)
+- Move ``WithinSessionSplitter`` creation outside the per-session loop in ``WithinSessionEvaluation``, since splitter parameters do not change per session (:gh:`963` by `Bruno Aristimunha`_)
 
 Version 1.4.3 (Stable - PyPi)
 -------------------------------
diff --git a/examples/advanced_examples/plot_hinss2021_classification.py b/examples/advanced_examples/plot_hinss2021_classification.py
@@ -145,7 +145,7 @@ def transform(self, X):
 # in approximately 8 times to the Cov+ElSel+TS+LDA pipeline.
 
 print("Averaging the session performance:")
-print(results.groupby("pipeline").mean("score")[["score", "time"]])
+print(results.groupby("pipeline")[["score", "time"]].mean())
 
 ###############################################################################
 # Plot Results
diff --git a/examples/external/learning_curve_p300_external.py b/examples/external/learning_curve_p300_external.py
@@ -36,6 +36,7 @@
 import moabb
 from moabb.datasets import BNCI2014_009
 from moabb.evaluations import WithinSessionEvaluation
+from moabb.evaluations.splitters import LearningCurveSplitter
 from moabb.paradigms import P300
 
 
@@ -114,8 +115,8 @@
 evaluation = WithinSessionEvaluation(
     paradigm=paradigm,
     datasets=datasets,
-    data_size=data_size,
-    n_perms=n_perms,
+    cv_class=LearningCurveSplitter,
+    cv_kwargs=dict(data_size=data_size, n_perms=n_perms),
     suffix="examples_lr",
     overwrite=overwrite,
     return_epochs=True,
diff --git a/examples/external/noplot_learning_curve_p300_external.py b/examples/external/noplot_learning_curve_p300_external.py
@@ -36,6 +36,7 @@
 import moabb
 from moabb.datasets import BNCI2014_009
 from moabb.evaluations import WithinSessionEvaluation
+from moabb.evaluations.splitters import LearningCurveSplitter
 from moabb.paradigms import P300
 
 
@@ -115,8 +116,8 @@
 evaluation = WithinSessionEvaluation(
     paradigm=paradigm,
     datasets=datasets,
-    data_size=data_size,
-    n_perms=n_perms,
+    cv_class=LearningCurveSplitter,
+    cv_kwargs=dict(data_size=data_size, n_perms=n_perms),
     suffix="examples_lr",
     overwrite=overwrite,
 )
diff --git a/examples/learning_curve/noplot_learning_curve_p300_external.py b/examples/learning_curve/noplot_learning_curve_p300_external.py
@@ -36,6 +36,7 @@
 import moabb
 from moabb.datasets import BNCI2014_009
 from moabb.evaluations import WithinSessionEvaluation
+from moabb.evaluations.splitters import LearningCurveSplitter
 from moabb.paradigms import P300
 
 
@@ -115,8 +116,8 @@
 evaluation = WithinSessionEvaluation(
     paradigm=paradigm,
     datasets=datasets,
-    data_size=data_size,
-    n_perms=n_perms,
+    cv_class=LearningCurveSplitter,
+    cv_kwargs=dict(data_size=data_size, n_perms=n_perms),
     suffix="examples_lr",
     overwrite=overwrite,
 )
diff --git a/examples/learning_curve/plot_learning_curve_motor_imagery.py b/examples/learning_curve/plot_learning_curve_motor_imagery.py
@@ -33,6 +33,7 @@
 import moabb
 from moabb.datasets import BNCI2014_001
 from moabb.evaluations import WithinSessionEvaluation
+from moabb.evaluations.splitters import LearningCurveSplitter
 from moabb.paradigms import LeftRightImagery
 
 
@@ -86,8 +87,8 @@
     datasets=datasets,
     suffix="examples",
     overwrite=overwrite,
-    data_size=data_size,
-    n_perms=n_perms,
+    cv_class=LearningCurveSplitter,
+    cv_kwargs=dict(data_size=data_size, n_perms=n_perms),
 )
 
 results = evaluation.process(pipelines)
diff --git a/examples/learning_curve/plot_learning_curve_p300.py b/examples/learning_curve/plot_learning_curve_p300.py
@@ -33,6 +33,7 @@
 import moabb
 from moabb.datasets import BNCI2014_009
 from moabb.evaluations import WithinSessionEvaluation
+from moabb.evaluations.splitters import LearningCurveSplitter
 from moabb.paradigms import P300
 
 
@@ -89,8 +90,8 @@
 evaluation = WithinSessionEvaluation(
     paradigm=paradigm,
     datasets=datasets,
-    data_size=data_size,
-    n_perms=n_perms,
+    cv_class=LearningCurveSplitter,
+    cv_kwargs=dict(data_size=data_size, n_perms=n_perms),
     suffix="examples_lr",
     overwrite=overwrite,
 )
diff --git a/examples/tutorials/tutorial_4_adding_a_dataset.py b/examples/tutorials/tutorial_4_adding_a_dataset.py
@@ -123,7 +123,7 @@ def _get_single_subject_data(self, subject):
         fs = data["fs"]
         ch_names = ["ch" + str(i) for i in range(8)] + ["stim"]
         ch_types = ["eeg" for i in range(8)] + ["stim"]
-        info = mne.create_info(ch_names, fs, ch_types)
+        info = mne.create_info(ch_names, float(np.squeeze(fs)), ch_types)
         raw = mne.io.RawArray(x, info)
 
         sessions = {}
diff --git a/moabb/analysis/plotting.py b/moabb/analysis/plotting.py
@@ -476,7 +476,7 @@ def summary_plot(sig_df, effect_df, p_threshold=0.05, simplify=True):
     if simplify:
         effect_df.columns = effect_df.columns.map(_simplify_names)
         sig_df.columns = sig_df.columns.map(_simplify_names)
-    annot_df = effect_df.copy()
+    annot_df = effect_df.copy().astype(object)
     for row in annot_df.index:
         for col in annot_df.columns:
             if effect_df.loc[row, col] > 0:
@@ -575,10 +575,10 @@ def _marker(pval):
     _min = 0
     _max = 0
     for ind, d in enumerate(dsets):
-        nsub = float(df_fw.loc[df_fw.dataset == d, "nsub"])
+        nsub = df_fw.loc[df_fw.dataset == d, "nsub"].item()
         t_dof = nsub - 1
         ci.append(t.ppf(0.95, t_dof) / np.sqrt(nsub))
-        v = float(df_fw.loc[df_fw.dataset == d, "smd"])
+        v = df_fw.loc[df_fw.dataset == d, "smd"].item()
         if v > 0:
             p = df_fw.loc[df_fw.dataset == d, "p"].item()
             if p < 0.05:
diff --git a/moabb/evaluations/base.py b/moabb/evaluations/base.py
@@ -1,5 +1,8 @@
 import logging
+import math
 from abc import ABC, abstractmethod
+from time import perf_counter
+from uuid import uuid4
 from warnings import warn
 
 import pandas as pd
@@ -11,8 +14,14 @@
 from moabb.evaluations.utils import (
     Emissions,
     _convert_sklearn_params_to_optuna,
+    _create_save_path,
     _create_scorer,
     _DictScorer,
+    _ensure_fitted,
+    _get_nchan,
+    _pipeline_requires_epochs,
+    _save_model_cv,
+    _score_and_update,
     check_search_available,
 )
 from moabb.paradigms.base import BaseParadigm
@@ -144,6 +153,11 @@ def __init__(
         if additional_columns is None:
             self.additional_columns = []
 
+        if self.cv_class is not None and hasattr(self.cv_class, "metadata_columns"):
+            for col in self.cv_class.metadata_columns:
+                if col not in self.additional_columns:
+                    self.additional_columns.append(col)
+
         if self.optuna and not optuna_available:
             raise ImportError("Optuna is not available. Please install it first.")
         if (self.time_out != 60 * 15) and not self.optuna:
@@ -222,9 +236,178 @@ def _resolve_cv(self, default_class, default_kwargs=None):
             cv_kwargs = {} if default_kwargs is None else dict(default_kwargs)
         else:
             cv_class = self.cv_class
-            cv_kwargs = {} if self.cv_kwargs is None else dict(self.cv_kwargs)
+            cv_kwargs = dict(self.cv_kwargs)
         return cv_class, cv_kwargs
 
+    def _load_data(
+        self,
+        dataset,
+        run_pipes,
+        process_pipeline,
+        postprocess_pipeline,
+        subjects=None,
+    ):
+        """Load data for an evaluation, handling epoch requirements.
+
+        Parameters
+        ----------
+        dataset : BaseDataset
+            The dataset to load.
+        run_pipes : dict
+            Pipelines to run (used to check epoch requirements).
+        process_pipeline : Pipeline
+            The processing pipeline.
+        postprocess_pipeline : Pipeline | None
+            Optional post-processing pipeline.
+        subjects : list | None
+            List of subjects to load. If None, loads all subjects.
+
+        Returns
+        -------
+        X : array-like or Epochs
+            The loaded data.
+        y : array-like
+            The labels.
+        metadata : DataFrame
+            The metadata.
+        """
+        requires_epochs = any(
+            _pipeline_requires_epochs(clf) for clf in run_pipes.values()
+        )
+        return_epochs = True if requires_epochs else self.return_epochs
+        kwargs = dict(
+            dataset=dataset,
+            return_epochs=return_epochs,
+            return_raws=self.return_raws,
+            cache_config=self.cache_config,
+            postprocess_pipeline=postprocess_pipeline,
+            process_pipelines=None if requires_epochs else [process_pipeline],
+        )
+        if subjects is not None:
+            kwargs["subjects"] = subjects
+        return self.paradigm.get_data(**kwargs)
+
+    @staticmethod
+    def _get_nchan(X):
+        """Extract number of channels from data (Epochs or ndarray)."""
+        return _get_nchan(X)
+
+    def _build_scored_result(
+        self,
+        dataset,
+        subject,
+        session,
+        pipeline,
+        n_samples,
+        n_channels,
+        duration,
+        scorer,
+        model,
+        X_test,
+        y_test,
+        split_metadata=None,
+        **extra,
+    ):
+        """Build a result dict and score it in one place."""
+        metadata = {}
+        if split_metadata is None:
+            splitter = getattr(getattr(self, "cv", None), "_current_splitter", None)
+            if splitter is not None and hasattr(splitter, "get_metadata"):
+                split_metadata = splitter.get_metadata()
+        if split_metadata:
+            metadata.update(split_metadata)
+        metadata.update(extra)
+        res = self._build_result(
+            dataset,
+            subject,
+            session,
+            pipeline,
+            n_samples,
+            n_channels,
+            duration,
+            **metadata,
+        )
+        try:
+            return _score_and_update(res, scorer, model, X_test, y_test)
+        except ValueError as err:
+            if self.error_score == "raise":
+                raise err
+            res["score"] = self.error_score
+            return res
+
+    def _fit_cv(self, model, X_train, y_train, tracker=None):
+        """Fit a model for a CV fold with optional CodeCarbon tracking."""
+        task_name = None
+        emissions = math.nan
+        if tracker is not None:
+            task_name = str(uuid4())
+            tracker.start_task(task_name)
+        t_start = perf_counter()
+        model.fit(X_train, y_train)
+        duration = perf_counter() - t_start
+        if tracker is not None:
+            emissions_data = tracker.stop_task()
+            emissions = emissions_data.emissions if emissions_data else math.nan
+        _ensure_fitted(model)
+        return duration, emissions, task_name
+
+    def _maybe_save_model_cv(
+        self, model, dataset, subject, session, name, cv_ind, eval_type
+    ):
+        """Save model for a CV fold when saving is enabled."""
+        if self.hdf5_path is None or not self.save_model:
+            return
+        model_save_path = _create_save_path(
+            hdf5_path=self.hdf5_path,
+            code=dataset.code,
+            subject=subject,
+            session=session,
+            name=name,
+            grid=self.search,
+            eval_type=eval_type,
+        )
+        _save_model_cv(model=model, save_path=model_save_path, cv_index=str(cv_ind))
+
+    @staticmethod
+    def _attach_emissions(res, emissions, task_name):
+        res["carbon_emission"] = (1000 * emissions,)
+        res["codecarbon_task_name"] = task_name
+
+    def _build_result(
+        self,
+        dataset,
+        subject,
+        session,
+        pipeline,
+        n_samples,
+        n_channels,
+        duration,
+        **extra,
+    ):
+        """Build a result dictionary with all required columns.
+
+        This is the single place where the evaluation result schema is defined.
+        All evaluation subclasses should use this instead of constructing the
+        dict manually, so the schema stays consistent when columns are added
+        or evaluations are merged.
+
+        Any ``additional_columns`` not provided via *extra* are defaulted to
+        NaN so that ``Results.add()`` never fails on a missing key.
+        """
+        res = {
+            "time": duration,
+            "dataset": dataset,
+            "subject": subject,
+            "session": session,
+            "n_samples": n_samples,
+            "n_channels": n_channels,
+            "pipeline": pipeline,
+        }
+        for col in self.additional_columns:
+            if col not in res:
+                res[col] = extra.get(col, math.nan)
+        return res
+
     def process(self, pipelines, param_grid=None, postprocess_pipeline=None):
         """Runs all pipelines on all datasets.
 
diff --git a/moabb/evaluations/evaluations.py b/moabb/evaluations/evaluations.py
diff --git a/moabb/evaluations/splitters.py b/moabb/evaluations/splitters.py
diff --git a/moabb/evaluations/utils.py b/moabb/evaluations/utils.py
diff --git a/moabb/tests/test_evaluations.py b/moabb/tests/test_evaluations.py
diff --git a/moabb/tests/test_splits.py b/moabb/tests/test_splits.py