Add a paper-fidelity model-training engine (nested-CV Monte-Carlo + ensembles) behind predict_samples

## Problem

`aap.predict_samples` is a deliberately **thin** multi-model comparison harness: for each
`(feature set × model)` it runs one `cross_validate` + refit over the estimators the user brings.
It does **not** reproduce the full machine-learning training protocol from the AAanalysis
γ-secretase paper (Breimann et al.) — that protocol is a substantial training engine with
install-fragile dependencies, which the golden-pipeline "thin wrapper, no own algorithm"
convention keeps out of a one-call pipeline. Users who want paper-faithful predictions (robust
aggregated prediction scores, tuned hyperparameters, ensembles) must currently hand-wire it.

## Goal

Provide a **core** training engine that faithfully reproduces the paper's protocol, which
`predict_samples` can optionally delegate to via an opt-in flag — so the thin default stays
core-sklearn and light, while paper-fidelity is one argument away.

## Requirements

- **10 model types**: random forest, extra trees, **xgboost**, **catboost**, LDA, logistic
  regression, SVM, MLP, plus **voting** and **stacking** (SVM meta-model) ensembles.
- **Monte-Carlo training**: N independent rounds (default 25) of balanced 80/20 train/test split.
- **Nested cross-validation**: inner 5-fold for feature selection + `GridSearchCV` hyperparameter
  tuning; outer 20% hold-out for independent per-round scoring.
- **Two-stage feature pre-selection**: Pearson-correlation filter (top-k × threshold grid) →
  random-forest-importance stepwise elimination by 5-fold F1 down to a floor (~25–50 features).
- **Aggregated prediction score**: mean predicted probability across models × rounds, with its std.
- **Metrics**: balanced accuracy (headline), accuracy, F1, precision, recall, TNR.
- **Class imbalance** via balanced splits (+ optional resampling); balanced accuracy is the
  imbalance-aware metric.
- `xgboost` / `catboost` gated behind a **new optional extra** (needs maintainer approval — touches
  `pyproject.toml`) so the core install stays light.
- `random_state` threaded through end to end (reproducibility contract).
- `predict_samples` gains an **opt-in path** (e.g. `engine="paper"`) that delegates to this engine;
  the thin default path is unchanged.

## KPIs / Acceptance criteria

- On `DOM_GSEC`, the engine reproduces the paper's headline performance within a documented
  tolerance on the matched dataset/annotation.
- The aggregated prediction score is reproducible for a fixed `random_state` (same seed → same
  score ± std).
- Per-round and aggregate `df_eval` carry all six metrics as mean ± std.
- ≥30 unit tests for the new primitive (per the testing standard), a reproducibility test, and an
  executed example notebook that passes `nbmake`.
- pyright clean on the new public surface; numpydoc docstrings with an example include.
- The new extra is documented in `pyproject.toml`, `_EXTRA_MODULES`, and the install docs.

## Scope / non-goals

- **Not** in the thin `predict_samples` default path (stays core-sklearn, no heavy deps) — the
  engine is strictly opt-in.
- No multiclass / regression targets — binary test-vs-reference, matching `predict_samples`.
- No agent/MCP/tool contracts — those live downstream in ProtXplain.

## Dependencies

- Builds on `predict_samples` (the thin comparison harness) and `find_features` feature sets.
- Requires approval for the new `xgboost` / `catboost` extra (touches `pyproject.toml`).
- The exact per-model hyperparameter grids live in Supplementary Data 10 of the paper (an external
  spreadsheet, not in the manuscript text) — they must be transcribed when implementing the grids.

## Standards checklist

- [ ] New extra approved + added to `pyproject.toml` and `_EXTRA_MODULES`
- [ ] Core primitive (Wrapper / Tool template) + thin opt-in delegation from `predict_samples`
- [ ] ≥30 tests, reproducibility test, executed `nbmake` notebook
- [ ] numpydoc docstrings with examples include; no internal decision-doc references in code/GitHub
- [ ] `df_eval` schema documented


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a paper-fidelity model-training engine (nested-CV Monte-Carlo + ensembles) behind predict_samples #276

Problem

Goal

Requirements

KPIs / Acceptance criteria

Scope / non-goals

Dependencies

Standards checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add a paper-fidelity model-training engine (nested-CV Monte-Carlo + ensembles) behind predict_samples #276

Description

Problem

Goal

Requirements

KPIs / Acceptance criteria

Scope / non-goals

Dependencies

Standards checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions