AlphaPy is a domain-agnostic machine learning pipeline framework for data scientists. It provides a YAML-driven workflow built on scikit-learn, pandas, and polars, with first-class support for XGBoost, CatBoost, LightGBM, and Optuna.
v4.0 note: Trading, markets, and the Alfi platform have moved to the private
alphapy-financerepository. SeeCHANGELOG.mdand thev3.1.1-monolithtag for the pre-split state.
- YAML-driven configuration — declarative project setup, no boilerplate
- Pluggable algorithms — sklearn estimators plus XGBoost, CatBoost, LightGBM
- Automated feature engineering — encoding (category_encoders), scaling, selection (LOFO, RFECV, univariate)
- Hyperparameter optimization — Optuna with sklearn integration
- Probability calibration — Venn-Abers, sigmoid, isotonic
- Imbalanced learning — built-in SMOTE and class-weight strategies
- Polars + pandas — choose your DataFrame engine
- Reproducible runs — every training run lands in a timestamped
runs/directory
git clone https://github.com/ScottfreeLLC/alphapy-pro.git
cd alphapy-pro
pip install -e ".[dev]"After install, the alphapy command is on your PATH directly.
Each ML project lives under projects/<name>/ with its own config/model.yml. Three example projects ship with the repo:
cd projects/kaggle # Titanic
alphapy
cd projects/pizza # Toppings ranker
alphapy
cd projects/time-series # Generic time-series forecasting
alphapyOutputs land in projects/<name>/runs/run_YYYYMMDD_HHMMSS/.
alphapy-pro/
├── alphapy/ # ML framework
│ ├── alphapy_main.py # Pipeline entry point
│ ├── model.py # Model management
│ ├── data.py # Generic CSV/Parquet loading
│ ├── frame.py # Frame wrapper (polars/pandas)
│ ├── features.py # Feature engineering
│ ├── transforms.py # Generic transforms
│ ├── variables.py # Declarative variable system
│ ├── estimators.py # Estimator registry
│ ├── optimize.py # Hyperparameter optimization
│ ├── plots.py # Visualization
│ └── ...
├── config/ # Global configs
│ ├── alphapy.yml # Main config (paths)
│ ├── algos.yml # Algorithm definitions
│ ├── variables.yml # Variable definitions
│ ├── groups.yml # Variable groups
│ └── model.yml.template # Per-project template
├── projects/ # Example projects
│ ├── kaggle/
│ ├── pizza/
│ └── time-series/
├── docs/ # Sphinx documentation
└── tests/ # Test suite
config/alphapy.yml— global pathsconfig/algos.yml— algorithm definitions and hyperparameter gridsconfig/variables.yml— feature variable expressionsconfig/groups.yml— feature groupingsprojects/<name>/config/model.yml— per-project model config (target, algorithms, CV, optimization, encoding, scaling, selection)
https://scottfreellc.github.io/alphapy-pro/
cd docs
make htmlalphapy-pro is consumed as a library by domain-specific repos (private):
alphapy-finance— trading, markets, Alfi (FastAPI/React) platformalphapy-sports— sports betting prediction
Both depend on alphapy-pro via { path = "../alphapy-pro", editable = true } in their pyproject.toml.
Apache-2.0. See LICENSE.