Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions packages/populace-fit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# populace-fit

The conditional-models operator of the [populace](../../DESIGN.md) stack —
imported as `populace.fit`. It fits conditional distributions `P(y | x)` over a
`populace.frame.Frame` and draws from them.

## Weight-aware by construction

Every fit reads the Frame's typed weights. There is no unweighted default: a
fit that ignores weights cannot be expressed except by passing `weights="none"`
explicitly, and the function says why that is the only escape hatch. `weights`
selects which typed weight vector to use — by default the **design** weights of
the entity that owns the predictors and targets.

This closes the 2026-06 microimpute landmine, where a silently-ignored
`weight_col` reproduced a high-income regime's mass at the wrong scale. Here the
weights are materialized into the fit by **weighted bootstrap**: training rows
are importance-resampled by weight before each forest is grown, so leaf
distributions — and every value drawn from them — reflect the weighted
population, not the unweighted sample.

## The canonical model

`QRF` (alias `RegimeGatedQRF`) is a regime-gated, sequentially-chained
quantile-regression-forest imputer:

- **Regime gates.** Each numeric target's sign support (negative / zero /
positive) is detected structurally (unweighted) from the training data. A
zero-inflated target gets a zero-vs-nonzero gate so its zero mass is
preserved exactly; a sign-mixed target gets a gate per sign so draws never
interpolate across a zero crossing.
- **Chaining.** Targets are imputed sequentially; each conditions on the
predictors plus the targets already drawn, so the joint structure across
targets is preserved.
- **Draws.** A random quantile is sampled per row (seeded) and the forest is
queried at it, so the draws sample the weighted conditional.

```python
from populace.fit import fit

fitted = fit(frame, predictors=["age", "is_male"], targets=["capital_gains"])
draws = fitted.predict(frame) # one column per target

# Unweighted is opt-in and explicit:
fitted_unweighted = fit(frame, predictors, targets, weights="none")
```

## Dependencies

The heavy dependencies (`scikit-learn`, `quantile-forest`) live here, never in
`populace-frame`: an analyst doing imputation installs this shard; an analyst
doing only calibration never pulls them.
34 changes: 34 additions & 0 deletions packages/populace-fit/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[project]
name = "populace-fit"
version = "0.1.0"
description = "The populace conditional-models operator: weight-aware conditional models over the Frame — the canonical regime-gated, sequentially-chained, weighted-bootstrap quantile-regression-forest imputer"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
# Pin the kernel to its 0.x minor series (the import-time compat gate in
# __init__.py asserts the same floor): the constellation must resolve, not
# fail only at import.
"populace-frame>=0.1,<0.2",
# scikit-learn 1.9 removed sklearn.tree._tree.DTYPE, which quantile-forest
# imports; cap below it until quantile-forest tracks the 1.9 tree ABI
# (upstream fix: zillow/quantile-forest#152).
"scikit-learn>=1.5,<1.9",
"quantile-forest>=1.3",
"numpy>=2",
"pandas>=2.3",
]

[tool.uv.sources]
populace-frame = { workspace = true }

[dependency-groups]
dev = [
"pytest>=8",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/populace"]
135 changes: 135 additions & 0 deletions packages/populace-fit/src/populace/fit/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
"""populace.fit: the conditional-models operator of the populace stack.

Fits conditional distributions ``P(targets | predictors)`` over a
:class:`~populace.frame.Frame` and draws from them. The operator is
**weight-aware by construction**: a fit reads the frame's typed weights, and the
only way to fit unweighted is to pass ``weights="none"`` explicitly
(:mod:`populace.fit.model`). The canonical model is a regime-gated, chained,
weighted-bootstrap quantile-regression-forest imputer (:mod:`populace.fit.qrf`).

Importing this shard asserts compatibility with the installed
:mod:`populace.frame` kernel — the constellation mechanism from DESIGN.md: a
shard pins ``populace-frame`` in its metadata *and* checks the kernel major at
import, so a resolver that ignores ``[tool.uv.sources]`` cannot silently
assemble an incompatible pair.
"""

from populace.frame import __version__ as _frame_version

#: The populace-frame major this shard is built against. The kernel is
#: pre-1.0, so during the 0.x line compatibility is pinned at the *minor*
#: level (0.x and 0.y may differ incompatibly); from 1.0 on this becomes the
#: major. Kept in lockstep with the ``populace-frame>=...`` floor in
#: ``pyproject.toml``.
_REQUIRED_FRAME_SERIES = (0, 1)


def _assert_frame_compatible(version: str, required: tuple[int, int]) -> None:
"""Raise unless the installed populace-frame is the expected series.

Args:
version: The installed ``populace.frame.__version__``.
required: The ``(major, minor)`` series this shard requires. The minor
is enforced only while the major is ``0`` (the pre-1.0 convention
that 0.x minors may break compatibility); from major ``1`` on, only
the major must match.

Raises:
ImportError: If the installed kernel is outside the required series. The
message names both versions and the fix.
"""
parts = version.split(".")
try:
installed = (int(parts[0]), int(parts[1]))
except (IndexError, ValueError): # pragma: no cover - defensive
raise ImportError(
f"populace-fit cannot parse populace-frame version {version!r}; "
f"expected a {required[0]}.{required[1]}.x kernel."
) from None

if required[0] == 0:
compatible = installed == required
expected = f"{required[0]}.{required[1]}.x"
else:
compatible = installed[0] == required[0]
expected = f"{required[0]}.x"

if not compatible:
raise ImportError(
f"populace-fit requires populace-frame {expected}, but "
f"{version} is installed. Install the matching constellation "
"(the workspace releases the shards in lockstep): upgrade or pin "
f"populace-frame to {expected}."
)


_assert_frame_compatible(_frame_version, _REQUIRED_FRAME_SERIES)

from populace.fit.model import ( # noqa: E402 - after the compatibility gate
DESIGN_WEIGHTS,
NO_WEIGHTS,
ConditionalModel,
FittedModel,
WeightSpec,
)
from populace.fit.qrf import ( # noqa: E402 - after the compatibility gate
DEFAULT_N_ESTIMATORS,
DEFAULT_ZERO_ATOL,
FittedRegimeGatedQRF,
Regime,
RegimeGatedQRF,
)

__version__ = "0.1.0"

#: The canonical conditional model, under its short public name.
QRF = RegimeGatedQRF


def fit(
frame,
predictors: list[str],
targets: list[str],
*,
weights: WeightSpec = DESIGN_WEIGHTS,
**model_kwargs,
) -> FittedModel:
"""Fit the canonical conditional model over ``frame``.

Convenience constructor: builds a :class:`RegimeGatedQRF` with
``model_kwargs`` and fits it. For a different model, instantiate it directly
and call its :meth:`~populace.fit.model.ConditionalModel.fit`.

Args:
frame: The :class:`~populace.frame.Frame` to fit on.
predictors: Conditioning variable names (one entity).
targets: Variable names to learn the conditional of (same entity).
weights: Which typed weight vector to weight the fit by; defaults to the
owning entity's ``"design"`` weights. ``"none"`` fits unweighted —
the only way to do so.
**model_kwargs: Forwarded to :class:`RegimeGatedQRF` (e.g.
``n_estimators``, ``zero_atol``, ``seed``).

Returns:
A :class:`~populace.fit.model.FittedModel`.
"""
return RegimeGatedQRF(**model_kwargs).fit(
frame, predictors, targets, weights=weights
)


__all__ = [
"ConditionalModel",
"FittedModel",
"WeightSpec",
"DESIGN_WEIGHTS",
"NO_WEIGHTS",
"QRF",
"RegimeGatedQRF",
"FittedRegimeGatedQRF",
"Regime",
"DEFAULT_N_ESTIMATORS",
"DEFAULT_ZERO_ATOL",
"fit",
"__version__",
]
Loading
Loading