Skip to content

Tabicl model implementation#21

Open
4xel-C wants to merge 8 commits into
Bayer-Group:mainfrom
4xel-C:tabicl_model_implementation
Open

Tabicl model implementation#21
4xel-C wants to merge 8 commits into
Bayer-Group:mainfrom
4xel-C:tabicl_model_implementation

Conversation

@4xel-C

@4xel-C 4xel-C commented Jun 11, 2026

Copy link
Copy Markdown

Add TabICL Support to MotherML

https://github.com/soda-inria/tabicl

Summary

This pull request introduces full MotherML integration for TabICL by adding new model wrappers, uncertainty support for regression, embedding extraction utilities, dedicated unit tests, fixture updates in the shared ML test suite, and optional dependency declarations for TabICL and PyTorch.

The main additions are:

  • TabICLClassifierMother and TabICLRegressorMother in m_tabicl.py
  • TabICLEmbeddingTransformer for extracting TabICL representations
  • A dedicated test suite in test/unit/test_tabicl.py
  • Updates to test/unit/test_ml.py so shared algorithm-level tests include tabicl
  • Updates to test/unit/test_mother_cv.py so shared algorithm-level tests include tabicl
  • Optional dependencies in pyproject.toml for tabicl and torch

Motivation

MotherML already supports several tabular ML backends such as CatBoost, Random Forest and TabPFN. This change extends that ecosystem with TabICL so it can be used through the same MotherML abstractions for:

  • classification
  • regression
  • Optuna hyperparameter search
  • uncertainty estimation for regression
  • embedding extraction

This keeps the API surface aligned across supported algorithms and allows TabICL to participate in both model-level and framework-level tests.

Main Changes

1. New TabICL wrappers in src/mother/ml/models/m_tabicl.py

This PR adds a new module implementing Mother-compatible wrappers around the upstream tabicl package.

_TabICLHyperParams

Introduces a shared hyperparameter mixin responsible for:

  • storing constructor parameters in _init_params
  • exposing get_params() and set_params() in a sklearn-compatible way
  • defining an Optuna search space via get_hyperparameter_space()
  • validating training inputs with _check_input_type()

The Optuna search space is model-aware:

  • n_estimators is always tunable
  • softmax_temperature is only suggested for classifiers
  • average_logits is only suggested for classifiers
  • average_logits is forced to False when n_estimators == 1
  • outlier_threshold is only suggested for regressors

TabICLClassifierMother

Adds a Mother wrapper for TabICLClassifier with:

  • default MotherML-compatible parameters
  • support for list, numpy.ndarray, and pandas.DataFrame inputs
  • sklearn-style get_params() and set_params() integration
  • parameter freezing after fitting via _is_fitted

TabICLRegressorMother

Adds a Mother wrapper for TabICLRegressor with the same parameter-management behavior, plus:

  • predict_uncertainty() returning a standardized MotherML uncertainty output
  • uncertainty computed from prediction quantiles using the interquartile range
  • support for return_quantiles=True
  • support for uncertainty_for_opt=True

The returned regression uncertainty output follows the same MotherML structure:

  • mean_predictions
  • knowledge_uncertainty
  • data_uncertainty
  • total_uncertainty

TabICLEmbeddingTransformer

Adds a transformer that extracts TabICL row representations via a forward hook on row_interactor.

Key supported behaviors:

  • regression and classification modes
  • out-of-fold embedding generation via k-fold cross-validation
  • group-aware folds with StratifiedGroupKFold and GroupKFold
  • use of a pre-fitted TabICL model
  • optional return format with one column per embedding dimension or a single vector column
  • preservation of DataFrame indices and feature ordering during transform

This makes TabICL usable as a learned feature extractor inside Mother pipelines.

2. Dedicated TabICL unit tests in test/unit/test_tabicl.py

This PR adds a dedicated test file for the new TabICL module.

The test coverage includes:

  • classifier default parameters
  • regressor default parameters
  • get_params() / set_params() behavior
  • invalid input handling
  • empty input handling
  • prediction shape validation
  • regression uncertainty outputs and return modes
  • Optuna hyperparameter-space generation
  • post-fit parameter immutability
  • embedding transformer initialization and fit/transform flows
  • grouped and non-grouped k-fold behavior
  • error handling for invalid usage patterns

The test suite currently reaches 91% coverage on src/mother/ml/models/m_tabicl.py.

3. Shared ML test integration in test/unit/test_ml.py and test/unit/test_mother_cv.py

The shared algorithm fixtures in test/unit/test_ml.py were updated so tabicl is now recognized as a valid algorithm in framework-level uncertainty tests.

Specifically:

  • all_classification_algorithms now instantiates TabICLClassifierMother when algorithm == "tabicl"
  • all_regression_algorithms now instantiates TabICLRegressorMother when algorithm == "tabicl"

This was necessary because ml.get_available_algorithms() already includes tabicl, but the fixtures previously only handled:

  • catboost
  • randomforest
  • tabpfn
  • lasso

4. Optional dependencies in pyproject.toml

This PR also updates optional dependencies so TabICL can be installed through extras.

The new optional dependency block is:

[project.optional-dependencies]
tabicl = [
    "tabicl>=2.1.1",
    "torch>=2.11.0",
]

This is aligned with the existing pattern already used for optional model backends such as TabPFN.

Validation

Complete tests suite have been run, including new tests.

  • uv run pytest

Results:

  • the dedicated TabICL unit tests pass
  • all tests pass with no error

Notes

  • Classification uncertainty for TabICLClassifierMother currently relies on the generic MotherML fallback implemented in AbstractMotherPipeline.predict_uncertainty(). This is consistent with the behavior already used for other classifiers without explicit uncertainty implementations.
  • Regression uncertainty is explicitly implemented for TabICLRegressorMother using quantile outputs.
  • The embedding transformer is included as part of the initial integration so TabICL can also be used as a representation learner inside downstream workflows.
  • This pull request have been developped to be consistent with MotherMl and the Tabpfn implementation.

Change Overview

In short, this PR:

  1. adds TabICL support to MotherML
  2. provides classifier, regressor and embedding transformer implementations
  3. adds dedicated unit tests with 91% coverage on m_tabicl.py
  4. updates shared test fixtures so tabicl participates in framework-level tests
  5. adds optional tabicl and torch dependencies in pyproject.toml

4xel-C added 5 commits June 10, 2026 09:41
classifier and embedder.
pyproject.toml: Adding optional dependencies for tabicl
management on non fitted models.

test.unit.test_tabicl: Redacted unit testing for tabicl model
addition of tabicl regressor and classifier
Copilot AI review requested due to automatic review settings June 11, 2026 11:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Mother-compatible wrappers for TabICL (classification, regression, and embedding extraction) and wires the new models into the existing ML/CV test matrices, along with dependency metadata updates.

Changes:

  • Introduces TabICLClassifierMother, TabICLRegressorMother, and TabICLEmbeddingTransformer with Optuna hyperparameter support and sklearn-style APIs.
  • Adds unit tests covering parameter handling, prediction/uncertainty, and embedding transformer behavior.
  • Registers tabicl as an optional dependency group and includes TabICL in algorithm-selection fixtures.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/unit/test_tabicl.py New unit tests for TabICL wrappers and embedding transformer.
test/unit/test_mother_cv.py Adds tabicl to classification/regression algorithm fixtures for CV tests.
test/unit/test_ml.py Adds tabicl to ML algorithm fixtures.
src/mother/ml/models/m_tabicl.py New Mother wrappers for TabICL + embedding transformer implementation.
pyproject.toml Adds tabicl dependency group (TabICL + torch).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py Outdated
4xel-C added 2 commits June 11, 2026 12:47
default quantiles variable. Documentation and typo correction.
Correction of Optuna hyperparameter logging that could raises depending
of the type of Trial / FixedTrial used through Optuna that may miss the
`.number` attribute.
Copilot AI review requested due to automatic review settings June 11, 2026 13:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py Outdated
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
Comment thread src/mother/ml/models/m_tabicl.py
(--extra tabicl) for running tests

src.mother.ml.models.m_tabicl.py: solved mutability problems on directly
returning the parameters dictionarries. Typo correction. Adding error
handling on GroupKFoldCV if a user pass a column of 1 group only. Ensure
2D for X on check_X_y methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants