Skip to content

Remove scikit-learn 1.5.0 limitation#24

Open
thomasATbayer wants to merge 34 commits into
mainfrom
update_sklearn_to_current_version
Open

Remove scikit-learn 1.5.0 limitation#24
thomasATbayer wants to merge 34 commits into
mainfrom
update_sklearn_to_current_version

Conversation

@thomasATbayer

@thomasATbayer thomasATbayer commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Branch: update_sklearn_to_current_version

PR #24 — "Remove scikit-learn 1.5.0 limitation"

Why this branch exists

The previous release pinned scikit-learn to <=1.5.0, blocking the entire dependency tree from upgrading. This branch lifts that pin and adapts the codebase to the API changes introduced in scikit-learn 1.6–1.8. Several companion libraries (CatBoost, TabPFN, UMAP, HDBSCAN) were also updated at the same time.


Summary of changes

Area File(s) What changed
Dependency versions pyproject.toml, uv.lock sklearn unpinned; catboost, tabpfn, umap bumped
Python support pyproject.toml, workflow.yml Drop 3.10, add 3.14
HDBSCAN clustering cv/cv_methods.py Switch from external hdbscan to sklearn.cluster.HDBSCAN; fix fingerprint conversion for 2D input
CatBoost sklearn compat ml/models/m_catboost.py Custom __sklearn_clone__ base class
TabPFN compat ml/models/m_tabpfn.py force_all_finiteensure_all_finite; pin V2 model weights
rnanorm removal ml/rna.py, pyproject.toml Native CPM/UQ/CUF replace unmaintained rnanorm
RNA input validation ml/rna.py Fail-fast guards for zero library sizes and degenerate inputs; CPM.transform enforces fit-before-transform via check_is_fitted
Pipeline utils pipeline_utils.py Use is_classifier() helper; get_ranking_pipeline saves/restores global metadata routing state
Learning to rank optimization/core.py Redesigned handle_metadata_routing with routing guard
CI — TabPFN weights workflow.yml Download step added to fast test job (was only in test-slow)
CI — prepareTabpfn scripts/prepareTabpfn.py Fixed missing from pathlib import Path import
Tests test/unit/test_catboost_clone.py New test suite for __sklearn_clone__
RNA tests test/unit/test_rna.py Removed xfail markers for UQ/CUF

Detailed changes

1. Dependency updates (pyproject.toml)

scikit-learn :  <=1.5.0           →  >=1.8.0,<1.9
catboost     :  >=1.2.6,<=1.2.8   →  >=1.2.9,<=1.2.10
tabpfn       :  ==2.1.0           →  ==8.0.8
umap-learn   :  >=0.5.7,<0.6      →  >=0.5.12,<0.6
hdbscan      :  removed from clustering extras (now provided by sklearn)
rnanorm      :  removed from rna extras (replaced by native implementation)
Python       :  >=3.10, <3.14     →  >=3.11, <3.15

scikit-learn upper bound raised from <=1.5.0 to <1.9:

  • sklearn 1.8.0 requires Python ≥ 3.11, which drove the Python version change below.
  • sklearn 1.9.0 is excluded: quantile-forest uses the removed private symbol sklearn.tree._tree.DTYPE. The fix is already merged on the quantile-forest GitHub but not yet released; the upper bound will be raised once they publish a compatible version.

catboost bumped to >=1.2.9,<=1.2.10:

  • 1.2.9 adds Python 3.14 support, fixes the setuptools.Distribution.dry_run removal (which broke installs on modern setuptools), and adds __sklearn_tags__ for sklearn ≥ 1.8 compatibility.

tabpfn bumped to ==8.0.8:

  • TabPFN 3 (the new default in 8.x) ships under a non-commercial license. The code now explicitly enforces the commercially-licensed V2 weights (see section 4).

umap-learn bumped to >=0.5.12,<0.6:

  • The 0.5.12 release (April 2026) is the first to support sklearn 1.8's revised __sklearn_tags__ API.

hdbscan external package removed — sklearn.cluster.HDBSCAN (available since sklearn 1.0) is sufficient and reduces the install footprint.

rnanorm removed — replaced by native numpy implementations inside rna.py (see section 6).


2. HDBSCAN clustering (cv/cv_methods.py)

Before:

try:
    import hdbscan
except ImportError as import_error:
    raise ExtrasDependencyImportError("clustering", import_error)

clusterer = hdbscan.HDBSCAN(...)

After:

from sklearn.cluster import HDBSCAN

clusterer = HDBSCAN(...)

Why: The standalone hdbscan package is no longer maintained at the same pace as sklearn, and sklearn ships its own HDBSCAN implementation. Removing the external dependency simplifies the [clustering] extra.

sklearn.cluster.HDBSCAN is imported inside hdbscan_clustering() (not at module level), and scikit-learn is a core dependency — so no ImportError can occur at runtime regardless of which extras are installed. The [clustering] extra now only lists kmedoids. test/test-dist.bash was updated to check for kmedoids instead of hdbscan when verifying the extra is installed.

Bug fix — fingerprint input conversion (hdbscan_clustering):

sklearn.cluster.HDBSCAN requires a 2D numeric feature matrix of shape (n_samples, n_bits). The old code converted a numpy array input to a list of ExplicitBitVect objects and then called np.array(fingerprints), producing a 1D object array that HDBSCAN cannot process.

# Before — wrong: produces a 1D object array of ExplicitBitVect
if isinstance(fingerprints, np.ndarray):
    fingerprints = [np_to_bv(fv) for fv in fingerprints]
clusterer.fit_predict(np.array(fingerprints))

# After — correct: 2D uint8 matrix (n_samples × n_bits)
if len(fingerprints) == 0:
    module_logger.warning("No valid fingerprints provided for clustering")
    return {}
if isinstance(fingerprints, np.ndarray):
    X = fingerprints  # already a 2D numeric matrix
else:
    nbits = fingerprints[0].GetNumBits()
    X = np.zeros((len(fingerprints), nbits), dtype=np.uint8)
    for i, fp in enumerate(fingerprints):
        DataStructs.ConvertToNumpyArray(fp, X[i])
clusterer.fit_predict(X)

Empty-input guard: The fixed code also adds a length check before accessing fingerprints[0], mirroring the guard already present in tanimoto_sphere_exclusion_clustering. Without it, an empty list or empty tuple input would cause IndexError at fingerprints[0].GetNumBits(). The guard returns {} early, consistent with the other clustering functions.


3. CatBoost sklearn clone compatibility (ml/models/m_catboost.py)

Problem: Newer CatBoost versions (>=1.2.9) internally deep-copy mutable constructor parameters such as cat_features, embedding_features, and text_features. This breaks sklearn's default clone() implementation, which uses identity (is) to verify that parameters were faithfully round-tripped after reconstruction.

Solution: A new mixin _CatboostModelMotherBase is introduced with a custom __sklearn_clone__ method. It is a pure mixin (inherits only from AbstractMotherPipeline, not from any CatBoost model class) and is inserted into the MRO of the three classes that expose mutable constructor params (cat_features, embedding_features, text_features) through get_params():

Note: CatboostGaussianProcessRegressorMother was initially excluded. Its explicit
constructor params (samples, prior_iterations, learning_rate, etc.) are all scalars,
so the default identity-based clone works for those. However, it also accepts **kwargs
that are forwarded directly to CatBoostRegressor.__init__, meaning mutable params such
as cat_features, embedding_features, and text_features can be supplied. CatBoost
1.2.9+ copies those lists internally, causing the same identity-check failure.
_CatboostModelMotherBase is therefore added to its MRO as well.

Class Before After
CatboostRegressorMother CatBoostRegressor, _CatboostHyperParams CatBoostRegressor, _CatboostModelMotherBase, _CatboostHyperParams
CatboostGaussianProcessRegressorMother CatBoostRegressor, _CatboostHyperParams CatBoostRegressor, _CatboostModelMotherBase, _CatboostHyperParams — its explicit constructor params are all scalars, but it also accepts **kwargs forwarded to CatBoostRegressor.__init__; mutable params such as cat_features, embedding_features, and text_features can be supplied that way and are copied internally by CatBoost 1.2.9+, so the override is required here too
CatboostClassifierMother CatBoostClassifier, _CatboostHyperParams, AbstractMotherPipeline CatBoostClassifier, _CatboostModelMotherBase, _CatboostHyperParams
CatboostRankerMother CatBoostRanker, _CatboostHyperParams, BaseEstimator CatBoostRanker, _CatboostModelMotherBase, _CatboostHyperParams, BaseEstimator

Solution: A new mixin _CatboostModelMotherBase is introduced with a custom __sklearn_clone__ method:

class _CatboostModelMotherBase(AbstractMotherPipeline):
    def __sklearn_clone__(self):
        klass = self.__class__
        params = self.get_params(deep=False)
        new_params = {k: copy.deepcopy(v) for k, v in params.items()}
        new_obj = klass(**new_params)
        # ... content-equality verification (type-dispatched for numpy/pandas) ...
        # ... preserve _metadata_request across clone ...
        return new_obj

Key points:

  • Uses type-dispatched content equality (numpy array_equal, pandas equals, or plain ==) instead of identity (is) to verify that mutable params were cloned correctly — avoiding the ValueError: The truth value of an array is ambiguous error.
  • Preserves _metadata_request on the new instance so that set_fit_request(group_id=True) calls survive a clone (which is what happens inside cross_val_score).
  • The old ensure_metadata_routing decorator on CatboostRankerMother.__init__ is removed; metadata routing is now handled purely by the caller (see the Learning to Rank section below).
  • Added the missing import copy (the method used copy.deepcopy but the module was previously never imported).

New tests — test/unit/test_catboost_clone.py (136 lines):

Test What it checks
test_clone_regressor_with_mutable_params cat_features / embedding_features / text_features survive clone on CatboostRegressorMother
test_clone_classifier_with_mutable_params Same for CatboostClassifierMother
test_clone_without_mutable_params Plain clone works for regressor, classifier, and ranker
test_clone_preserves_metadata_routing_request set_fit_request state is preserved on CatboostRankerMother (skipped on sklearn < 1.3)
test_cross_val_score_regressor_with_cat_features cross_val_score completes end-to-end with cat_features
test_cross_val_score_classifier_with_cat_features Same for classifier
test_clone_gp_regressor_without_mutable_params clone() succeeds for CatboostGaussianProcessRegressorMother without mutable kwargs
test_clone_gp_regressor_with_mutable_params clone() succeeds with cat_features / embedding_features / text_features passed via **kwargs to the GP regressor

4. TabPFN compatibility (ml/models/m_tabpfn.py)

force_all_finite removal (sklearn 1.8):

sklearn 1.8 removed the force_all_finite argument from check_array, replacing it with ensure_all_finite (introduced in 1.6). A small compatibility shim is built once at module load:

_sklearn_version = tuple(int(x) for x in sklearn.__version__.split(".")[:2])
_ALLOW_NAN_KWARG = (
    {"ensure_all_finite": "allow-nan"} if _sklearn_version >= (1, 6)
    else {"force_all_finite": "allow-nan"}
)
# usage:
check_array(X, ensure_2d=False, **_ALLOW_NAN_KWARG)

TabPFN v8 API change and commercial licensing:

TabPFN 8.x introduced TabPFN 3 as the new default model, but TabPFN 3 is released under a non-commercial license. To ensure Mother always uses the commercially-licensed V2 weights, both TabPFNRegressorMother and TabPFNClassifierMother now explicitly select V2 at construction time, following the official recommendation:

from tabpfn.constants import ModelVersion

if "model_path" not in kwargs:
    kwargs["model_path"] = TabPFNRegressor.create_default_for_version(ModelVersion.V2).model_path

Users can still override model_path to select a different version.


5. is_classifier() in pipeline utils (pipeline_utils.py)

# Before
if hasattr(val_estimator, "_estimator_type") and val_estimator._estimator_type == "classifier":

# After
if is_classifier(val_estimator):

Why this matters beyond style: In sklearn 1.8, Pipeline no longer exposes _estimator_type as a public attribute. The old hasattr check therefore silently returned False for pipeline-wrapped classifiers, causing _proba_ columns to go missing from multi-target classification CV output. Using sklearn.base.is_classifier() — the canonical API — works correctly across all supported versions.


Learning to rank — detailed changes (optimization/core.py, pipeline_utils.py)

Background

The learning-to-rank path in MotherTuner.optimize() uses cross_val_score with a CatboostRankerMother estimator and an NDCG scorer. This requires that group_id (the query-group array used by the ranker) be forwarded to both fit() and score() during cross-validation. sklearn only forwards scorer kwargs when enable_metadata_routing=True is set globally.

Why sklearn 1.5–1.8 forced these changes:

sklearn version Relevant behaviour
≤ 1.4 Metadata routing does not exist. groups= can always be passed directly to cross_val_score.
1.5 Metadata routing introduced as an opt-in experimental feature. When enabled, sklearn starts enforcing that CV-split groups travel through params= rather than as a direct groups= kwarg — passing both raises ValueError.
1.6 force_all_finite deprecated in check_array in favour of ensure_all_finite.
1.8 force_all_finite removed entirely. Pipeline._estimator_type removed. Metadata routing enforcement is stricter and more widely tested.

Under sklearn ≤ 1.5 the old ranking code worked by coincidence: routing was off by default, groups= was always accepted, and the enforcement gap was never triggered. Upgrading to ≥ 1.5 (and especially 1.8) exposed all three latent bugs described below.

The old design (problems)

The handle_metadata_routing decorator previously:

  1. Called skl_get_config() on every invocation to check whether metadata routing was globally enabled.
  2. If ranking groups were passed without metadata routing enabled, it raised a hard AssertionError telling the caller to call sklearn.set_config(enable_metadata_routing=True) manually.
  3. When metadata routing was enabled, it moved groups from the direct cross_val_score kwarg into fit_kwargs; but this introduced a subtle bug: the groups key was not always present in fit_kwargs when it was later removed (fit_kwargs.pop("groups")), causing a KeyError.
  4. get_ranking_pipeline permanently set enable_metadata_routing=True with no teardown. Under sklearn ≤ 1.5 this was harmless because the enforcement of routing rules was not yet applied to subsequent non-ranking calls. Under sklearn 1.5+ any subsequent non-ranking cross_val_score call with groups= raised ValueError.

The new design

The decorator reads the global config once per call (sklearn.get_config()["enable_metadata_routing"]) and uses three distinct paths:

ranking_groups is not None  →  "ranking path"
  Guard: raises RuntimeError immediately if routing is not enabled.
  Both CV groups and group_id are injected into fit_kwargs and forwarded
  via params=.  The direct groups= kwarg is NOT set (sklearn 1.5+ rejects
  it when routing is enabled).

ranking_groups is None + routing OFF (default)  →  "non-ranking classic path"
  CV split groups are passed as groups= directly to cross_val_score.

ranking_groups is None + routing ON  →  "non-ranking routing-aware path"
  sklearn 1.5+ raises ValueError when groups= is passed directly to
  cross_val_score while enable_metadata_routing=True (e.g. after a prior
  ranking call in the same process).  Groups are redirected into
  fit_kwargs / params= instead.
routing_on = sklearn.get_config()["enable_metadata_routing"]

if ranking_groups is not None:
    if not routing_on:
        raise RuntimeError(
            "Ranking optimization requires sklearn metadata routing to be enabled. "
            "Call sklearn.set_config(enable_metadata_routing=True) before optimize()."
        )
    fit_kwargs["group_id"] = ranking_groups
    if groups is not None:
        fit_kwargs["groups"] = groups
    kwargs["_groups_as_cross_val_args"] = False
else:
    if routing_on:
        if groups is not None:
            fit_kwargs["groups"] = groups
        kwargs["_groups_as_cross_val_args"] = False
    else:
        kwargs["_groups_as_cross_val_args"] = True

CatboostRankerMother.__init__'s @ensure_metadata_routing decorator is removed — the responsibility for calling sklearn.set_config(enable_metadata_routing=True) belongs to the caller, not the estimator constructor.

Bug fix: fit_kwargs.pop("groups")fit_kwargs.pop("groups", None) in MotherTuner.optimize(). When groups is None it was never injected into fit_kwargs, so a direct .pop() raised KeyError. The safe form avoids this.

get_ranking_pipeline no longer mutates global routing state (pipeline_utils.py)

Old behaviour: get_ranking_pipeline called sklearn.set_config(enable_metadata_routing=True) with no teardown, permanently changing global sklearn state for the rest of the process. Any subsequent non-ranking cross_val_score call in the same process could silently break because sklearn 1.5+ rejects groups= as a direct kwarg when routing is on.

New behaviour: get_ranking_pipeline snapshots the current routing setting before enabling it, then restores the original value once the pipeline and tuner objects have been constructed. A try/finally block guarantees restoration even if an exception is raised during pipeline construction:

_prev_routing = sklearn.get_config().get("enable_metadata_routing", False)
sklearn.set_config(enable_metadata_routing=True)
try:
    # ... build ranking_pipeline and tuner ...
finally:
    sklearn.set_config(enable_metadata_routing=_prev_routing)
return ranking_pipeline, tuner

Callers that need metadata routing active for fitting or tuning (i.e. every ranking workflow) must now enable it explicitly around those calls:

sklearn.set_config(enable_metadata_routing=True)
pipeline.fit(X, y, group_id=ranking_groups)
tuner.optimize(..., ranking_groups=ranking_groups)

Or equivalently, use sklearn.config_context(enable_metadata_routing=True) as a context manager.

Test update (test/unit/test_pipeline_utils.py): test_get_ranking_pipeline now calls sklearn.set_config(enable_metadata_routing=True) explicitly after get_ranking_pipeline() returns, since the helper no longer leaves routing permanently on. A preserve_metadata_routing fixture restores the original routing state on teardown.

Why sklearn 1.5+ rejects groups= as a direct kwarg when routing is on

sklearn 1.5+ enforces that when metadata routing is active, CV-split groups must travel through params=. Passing groups= as a direct kwarg alongside params= raises ValueError. The new decorator handles this by routing groups through fit_kwargs whenever routing is enabled.


CI changes (.github/workflows/workflow.yml)

Python version Before After Reason
3.10 ❌ dropped sklearn 1.8 requires Python ≥ 3.11
3.11
3.12
3.13
3.14 ✅ added Enabled by catboost 1.2.9 (Python 3.14 support + setuptools dry_run fix)

Release and docs jobs: Python pinned from 3.103.11.

TabPFN weight download added to fast test job:
The scripts/prepareTabpfn.py step was originally only present in test-slow. The fast test matrix job also imports TabPFNRegressorMother/TabPFNClassifierMother (e.g. in test_tabpfn.py import-time checks), so the download step is now also run before uv run poe coverage.

scripts/prepareTabpfn.py — missing import fixed:
The script used Path(model_path) without from pathlib import Path, causing a NameError on every CI run. The import was added.


RNA normalizer tests (test/unit/test_rna.py)

On the main branch, test_different_normalisation_methods was a single parametrize over ["Scanpy", "UQ", "CUF", "CPM"] with all four cases expected to pass. After sklearn was updated to 1.8 (in an intermediate commit on this branch), UQ and CUF started failing because rnanorm 2.2.0 calls the internal sklearn._validate_data method which was removed in sklearn 1.8. The tests were therefore temporarily marked xfail(strict=True) to document the known breakage:

# intermediate state on this branch — xfail because rnanorm broke under sklearn 1.8
pytest.param("UQ",  marks=pytest.mark.xfail(reason="rnanorm<=2.2.0 calls deprecated sklearn._validate_data removed in sklearn>=1.8", strict=True))
pytest.param("CUF", marks=pytest.mark.xfail(reason="rnanorm<=2.2.0 calls deprecated sklearn._validate_data removed in sklearn>=1.8", strict=True))

Once rnanorm was replaced with the native implementations in rna.py (see section 6 below), the xfail marks were removed and the test reverts to the same simple form as main:

@pytest.mark.parametrize("norm_method", ["Scanpy", "UQ", "CUF", "CPM"])
def test_different_normalisation_methods(...)

All four normalization methods now pass cleanly against sklearn 1.8.


6. RNA input validation guards (ml/rna.py)

Zero or degenerate inputs previously propagated silently as inf/nan through downstream pipelines. Three fail-fast checks are now raised as ValueError with specific messages:

Location Condition checked Error message
CPM.transform Any sample row-sum is 0 "CPM normalization requires each sample to have a positive library size (row sum > 0). Found one or more all-zero samples."
UQ._raw_factors All genes are zero after filtering (empty matrix) "UQ/CUF normalization requires at least one gene with non-zero counts, but all genes are zero across all samples."
UQ._raw_factors Any sample row-sum is 0 over expressed genes "UQ/CUF normalization requires each sample to have a positive library size (row sum > 0 over expressed genes)."
UQ._raw_factors Any 75th-percentile factor is 0 (would cause log(0) in fit) "UQ/CUF normalization produced a zero scaling factor ... Check for samples with very sparse counts."

sklearn fit-before-transform contract (CPM.transform):

UQ.transform and CUF.transform already called check_is_fitted(self) as their first line, raising NotFittedError if called before fit(). CPM.transform was missing this guard, so calling it unfitted would silently proceed (skipping n_features_in_ / feature_names_in_ population) and produce incorrect output when used with set_output(transform="pandas"). The guard is now added for consistency:

def transform(self, X):
    check_is_fitted(self)  # added — consistent with UQ and CUF
    X_arr = np.asarray(X, dtype=float)
    ...

7. rnanorm replacement (ml/rna.py)

Problem: rnanorm 2.2.0 calls sklearn._validate_data (an internal, underscore-prefixed method) which was removed entirely in sklearn 1.8, causing UQ and CUF normalization to raise AttributeError at runtime. The package is unmaintained and unlikely to be fixed.

Solution: CPM, UQ, and CUF are now implemented as native sklearn transformers directly in rna.py, using only numpy. rnanorm is removed from the [rna] extras in pyproject.toml.

The math faithfully replicates the rnanorm / edgeR algorithm:

Class Formula
CPM X[i,j] / sum(X[i,:]) × 10⁶
UQ CPM with effective library size: lib_size × (UQ_factor / geomean(UQ_factors)), where UQ_factor[i] = percentile75(non-zero genes of sample i) / lib_size[i]
CUF X[i,j] / (UQ_factor[i] / geomean(UQ_factors)) — same factors as UQ but raw counts divided directly, no CPM scaling

Output values were verified against rnanorm's own documentation examples (toy dataset from Bullard et al. 2010).

Copilot AI review requested due to automatic review settings June 17, 2026 16:49
@thomasATbayer thomasATbayer linked an issue Jun 17, 2026 that may be closed by this pull request

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to lift the previous scikit-learn version cap (<=1.5.0) while keeping MotherML’s model wrappers compatible with scikit-learn cloning behavior—especially for CatBoost estimators where newer CatBoost versions copy mutable constructor parameters.

Changes:

  • Adds a custom __sklearn_clone__ implementation for CatBoost Mother estimators and introduces unit tests to validate clone behavior and CV integration.
  • Expands supported Python/scikit-learn versions in packaging and CI (Python >=3.11; scikit-learn up to 1.8.0).
  • Updates TabPFN dependency/version and changes TabPFN Mother defaults to select V2 model weights when model_path is not provided.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/mother/ml/models/m_catboost.py Adds a custom sklearn clone implementation and wires it into CatBoost Mother estimator classes.
test/unit/test_catboost_clone.py Introduces tests covering clone round-trips, metadata routing preservation, and cross_val_score integration.
src/mother/ml/models/m_tabpfn.py Changes TabPFN Mother defaults to force V2 model weights when model_path is absent.
pyproject.toml Updates Python/scikit-learn constraints and bumps TabPFN dependency; adjusts CatBoost constraint.
.github/workflows/workflow.yml Updates CI Python matrix and default Python version from 3.10 to 3.11.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mother/ml/models/m_catboost.py Outdated
Comment thread src/mother/ml/models/m_catboost.py
Comment thread test/unit/test_catboost_clone.py Outdated
Comment thread pyproject.toml Outdated
Comment thread src/mother/ml/models/m_tabpfn.py
Comment thread src/mother/ml/models/m_tabpfn.py
Comment thread test/unit/test_catboost_clone.py Outdated
@thomasATbayer thomasATbayer added bug Something isn't working enhancement New feature or request dependencies Pull requests that update a dependency file labels Jun 17, 2026
@thomasATbayer thomasATbayer self-assigned this Jun 17, 2026
Copilot AI review requested due to automatic review settings June 17, 2026 17:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Comment thread src/mother/ml/models/m_tabpfn.py Outdated
Comment thread src/mother/ml/models/m_catboost.py Outdated
Comment thread src/mother/ml/models/m_catboost.py
Comment thread test/unit/test_catboost_clone.py
Comment thread test/unit/test_catboost_clone.py
Copilot AI review requested due to automatic review settings June 17, 2026 17:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Comment thread test/unit/test_rna.py Outdated
Comment thread test/unit/test_rna.py Outdated
Comment thread src/mother/ml/models/m_catboost.py Outdated
Comment thread src/mother/ml/models/m_catboost.py Outdated
Comment thread src/mother/ml/models/m_tabpfn.py Outdated
Comment thread src/mother/optimization/core.py Outdated
Copilot AI review requested due to automatic review settings June 17, 2026 18:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/mother/ml/models/m_catboost.py:1414

  • The CatboostRankerMother docstring still claims the estimator “automatically enables sklearn's metadata routing”, but the @ensure_metadata_routing decorator was removed. This is now misleading for users relying on group_id routing in CV/ranking flows.
    and automatic metadata routing enablement.

    This class extends the CatBoostRanker and integrates with the Mother framework to provide
    dynamic hyperparameter tuning using Optuna. It automatically enables sklearn's metadata routing
    if not already enabled, which is required for passing group_id parameters during training.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Comment thread src/mother/ml/rna.py
Comment thread src/mother/pipeline_utils.py
Copilot AI review requested due to automatic review settings June 19, 2026 12:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/mother/ml/models/m_catboost.py:1414

  • The CatboostRankerMother docstring still claims it 'automatically enables sklearn's metadata routing' after the @ensure_metadata_routing decorator was removed. This is now misleading for users and conflicts with the updated guidance below about passing group_id via routing.
    and automatic metadata routing enablement.

    This class extends the CatBoostRanker and integrates with the Mother framework to provide
    dynamic hyperparameter tuning using Optuna. It automatically enables sklearn's metadata routing
    if not already enabled, which is required for passing group_id parameters during training.

Comment thread src/mother/pipeline_utils.py
Copilot AI review requested due to automatic review settings June 19, 2026 12:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Comment thread src/mother/ml/rna.py
Copilot AI review requested due to automatic review settings June 19, 2026 13:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Comment thread src/mother/cv/cv_methods.py
Copilot AI review requested due to automatic review settings June 19, 2026 13:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/mother/ml/rna.py:10

  • The module raises ExtrasDependencyImportError("rna", ...) if either anndata or scanpy is missing. That prevents importing/using the new pure-numpy normalizers (CPM/UQ/CUF) even though they don't depend on scanpy/anndata. Consider importing scanpy/anndata lazily (e.g., inside ScanpyPreprocessor/RNA) so CPM/UQ/CUF remain usable without the full [rna] extra.
try:
    import anndata
    import scanpy as sc
except ImportError as import_error:
    from mother.errors import ExtrasDependencyImportError

    raise ExtrasDependencyImportError("rna", import_error)

Comment on lines 62 to 66
@@ -67,26 +65,33 @@ def wrapper(*args, **kwargs):
fit_kwargs = {}
kwargs["fit_kwargs"] = fit_kwargs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependencies Pull requests that update a dependency file enhancement New feature or request

Projects

None yet

2 participants