populace-fit: weight-aware conditional models (regime-gated chained QRF) by MaxGhenis · Pull Request #2 · PolicyEngine/populace

MaxGhenis · 2026-06-10T08:02:46Z

The conditional-models operator — populace.fit, the second shard of the stack (after populace-frame). Per DESIGN.md ("populace-fit: conditional models").

What's here

ConditionalModel / FittedModel protocols (model.py) — fit(frame, predictors, targets, *, weights="design") -> FittedModel; FittedModel.predict(frame_or_df) -> DataFrame (one draw column per target). Weight-aware by construction: weights selects which typed weight vector of the owning entity to use (default that entity's design weights), reading the Frame's typed Weights rather than a raw array. weights="none" is the only way to fit unweighted, and a misspelled or mismatched kind raises (naming the culprit) instead of silently falling back to unweighted — closing the 2026-06 microimpute landmine at the type boundary. resolve_fit_weights is the single authority for this rule.
QRF / RegimeGatedQRF (qrf.py) — the canonical model: regime-gated (structural, unweighted sign-mixture gates), sequentially chained (each target conditions on the predictors plus the targets already drawn), quantile-regression-forest draws (a seeded per-row quantile), with the frame's weights materialized by weighted bootstrap (importance-resample the training rows by weight before growing each forest). Reimplemented from scratch against the Frame — it does not import microimpute. This is the microimpute#196 fix as the reference mechanism.
Import-time kernel-compat assert (__init__.py) — the charter's constellation-versioning mechanism: populace-frame's series is checked at import (pre-1.0, the 0.x minor; major from 1.0 on), so a loose resolver that ignores [tool.uv.sources] cannot silently assemble an incompatible kernel pair.

The headline contract

test_weighted_fit_shifts_draws_toward_weighted_truth is the real realization of the placeholder the kernel left skipped in packages/populace-frame/tests/test_contracts.py. On a donor whose target is large exactly where weight is small (the #196 shape; the high-value regime is independent of the predictors, so honoring the weight is the only way to recover the weighted conditional):

(a) the weighted fit's draws' mean lands within 20% of the true weighted mean;
(b) weights="none" lands within 20% of the unweighted mean;
(c) the two differ by >3x.

This is the microimpute#196 bug class — now a standing guarantee of the stack rather than a latent footgun. Follow-up: the kernel's skipped test_weighted_fit_shifts_draws_toward_weighted_truth placeholder can be unskipped/removed once this shard is in the workspace (it could not be edited from this branch's scope).

Plus: regime gates preserve a zero-inflated target's zero mass and both signs (no zero-crossing); chaining reproduces a cross-target correlation; predict row-count/index match the input; fixed seed is deterministic; weights="none" is the only unweighted path. 35 tests, n=5000 seeded for CI speed.

Note on the scikit-learn pin

scikit-learn is capped >=1.5,<1.9. scikit-learn 1.9 removed sklearn.tree._tree.DTYPE, which quantile-forest imports — so an unbounded >=1.5 resolves to 1.9 and import quantile_forest fails. On the workspace's Python 3.14 interpreter the cap keeps the only working combination (scikit-learn 1.8 + quantile-forest 1.4) resolvable; the cap can be lifted once quantile-forest tracks the 1.9 tree ABI.

Optional heavy deps (scikit-learn, quantile-forest) stay in this shard, never in populace-frame.

Validation

uv sync --all-packages && uv run pytest packages/populace-fit && uv run ruff check packages/populace-fit — all green (35 passed; ruff clean). Full workspace suite: 192 passed, 3 skipped (the microunit/policyengine_us-gated kernel tests).

🤖 Generated with Claude Code

Add the packages/populace-fit shard skeleton: src-layout PEP 420 namespace (no src/populace/__init__.py), its own pyproject (deps populace-frame + scikit-learn + quantile-forest + numpy + pandas), and [tool.uv.sources] populace-frame = workspace so the workspace resolves it locally. scikit-learn is capped <1.9: 1.9 removed sklearn.tree._tree.DTYPE, which quantile-forest imports, breaking import on the only Python-3.14 wheel set. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…d QRF model.py: the ConditionalModel/FittedModel protocols and resolve_fit_weights — the single authority enforcing that a fit is weighted by construction (weights reads the owning entity's typed Weights; weights='none' is the only unweighted path; a misspelled or mismatched kind raises rather than silently fitting unweighted). predictors_targets_entity refuses predictors/targets that span entities. qrf.py: the canonical model. Regime detection (structural/unweighted sign-support gates), sequential chaining (each target conditions on predictors plus the targets already drawn), and weighted bootstrap (importance-resample rows by weight before growing each forest — the microimpute#196 fix, reimplemented from scratch against the Frame, not imported). Draws sample the weighted conditional by querying the forest at a per-row seeded quantile. __init__.py: public API (ConditionalModel, QRF/RegimeGatedQRF, fit) and the constellation compatibility gate — asserts populace-frame's major/minor at import so a loose resolver cannot assemble an incompatible kernel pair. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…d placeholder) test_weighted_fit_contract.py is the real realization of the kernel's skipped test_weighted_fit_shifts_draws_toward_weighted_truth: on a donor whose target is large exactly where weight is small (the #196 shape, ~20% low-weight huge-value rows), the weighted fit's draws land within 20% of the true weighted mean, weights='none' lands within 20% of the unweighted mean, and the two differ by >3x. Also asserts the default is weighted (no unweighted default). test_qrf.py: regime gates preserve a zero-inflated target's zero mass and both signs with no zero-crossing; chaining reproduces a cross-target correlation; predict row-count/index match the input; fixed seed is deterministic; successive predicts draw independently. test_model.py: weights='none' is the only unweighted path (a typo'd kind raises and names it; a mismatched kind raises and names the stored kind); predictors/targets spanning entities are refused. test_compat.py exercises the import gate. n=5000 seeded for CI speed; 35 tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis · 2026-06-10T08:45:14Z

Adversarial review — converting to draft; soundness fixes needed before merge

An independent clean-room review (every finding reproduced) found bugs in the operator's core purpose. Marking draft. Ranked:

HIGH — the weighted-bootstrap gate annihilates rare low-weight classes. The gate is fit on an n-of-n weighted resample, so a scarce-but-oversampled stratum (exactly what the charter's pool design carries: oversampled in rows, downweighted in mass) gets zero rows in the resample → single-class gate → that class drawn with probability 0, forever. Reproduced: 9/10 gates single-class, 0 positive draws in 2M rows vs ~80 expected. The fit re-rarefies what the pool deliberately oversampled — the opposite of "tail support is strata's job." Fix: HistGradientBoostingClassifier honors sample_weight exactly; fit the gate weighted, drop its bootstrap (the forests still need the bootstrap — QRF ignores weight magnitude, confirmed at _quantile_forest.py:266).

HIGH — household-weights-only frames can't be fit weighted at all. The canonical CPS shape (person-level fit, household design weights) fails weights_for("person"); only weights="none" runs. The operator built to prevent unweighted-by-accident makes the unweighted escape hatch the only thing that runs on the most representative input — microimpute#196's social mechanism, rebuilt. Fix: resolve through effective (broadcast) weights — touches kernel API (_effective_weights is private; accounting already uses it, so wmean is weighted on a frame where fit refuses to be).

HIGH — NaN targets silently become zeros in gated regimes (NaN-blind sign labels → zero class). Survey item-nonresponse silently moves mass to $0. Fix: validate finite at fit, raise naming the column + count.

MEDIUM — tail mass undershot ~2x, and the contract test passes for the wrong reason ("delete all low-weight rows" also passes — it asserts only means, never that the rare regime survives). Plus the 201-point grid winsorizes draws to [0.5%, 99.5%]. This is the capital-gains/dividends tail problem at the method level. Fix: assert high-draw-share survival, draw leaf values as atoms incl. endpoints.

MEDIUM — fit and draw RNG share one stream (predict quantiles bit-identical to the gate's bootstrap uniforms; max|diff|=0). Fix: SeedSequence(seed).spawn(2).

LOW — missing populace-frame>=0.1,<0.2 pin (charter mandate; gate works but resolution-time failure is the better failure); impossible-remediation error message; duplicate-target / target-in-predictors unguarded; (n×201) memory at design scale.

Sound, must not regress: the weighted factorization P(sign|x)·P(y|x,sign); chained-equations semantics (drawn values fed forward); resolve_fit_weights as the single enforcement point; the compat gate; determinism. The headline contract does kill the literal #196 bug — it just can't see tail mass, rare-class survival, or the household-weighted frame it never builds.

Add Frame.resolve_weights(entity) -> Weights: resolves effective weights like _effective_weights but returns a typed Weights that carries the source entity's kind. An entity without its own stored weights inherits the single weighted group entity's design/importance/calibrated kind and broadcast values; an entity with its own weights is returned as-is. The existing ambiguity guards (zero/multiple weighted group entities) are kept. This fixes the "household-weighted frame can't be fit weighted" bug: a person-level fit can now read the inherited household kind instead of a bare ndarray that dropped the kind. accounting._resolve migrates to resolve_weights(owner).values (behavior identical). Regression tests (test_bundle.py TestResolveWeights): person resolve on a household-weighted frame returns Weights(kind=design, broadcast values); calibrated household resolves to calibrated person; an entity with its own weights returns that exact object; ambiguity (two weighted group entities) still raises; unknown entity is named. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

resolve_fit_weights now resolves via frame.resolve_weights(entity) rather than frame.weights_for(entity), so a person-level fit on a household-weighted frame inherits the household weights (and their kind) through membership instead of raising. This was the bug: the canonical CPS shape (person predictors/targets, design weights only on the household) could not be fit weighted at all. The kind discipline is unchanged — the requested kind must match the resolved (possibly inherited) kind, else raise. The impossible-remediation message is fixed: requesting "design" on a calibrated frame no longer advises "advance the frame's weights to design" (kinds only move forward, so that is impossible); it now tells the caller to pass weights="calibrated", the kind the frame actually carries. The forward direction (e.g. requesting calibrated on a design frame) keeps the advance-the-weights advice. Regression tests (test_model.py): the CPS shape fits weighted and the resolved vector broadcasts the household weights onto persons; a default design fit on a calibrated frame raises naming weights="calibrated" and not the impossible advance-to-design advice. The existing kind-mismatch test now matches "resolved weights" wording. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The sign-class gate (HistGradientBoostingClassifier) was fit on an n-of-n weighted bootstrap, which deletes vanishingly-rare low-weight classes outright: a positive row at weight 1 among thousands of zeros at weight 50 is drawn with probability ~4e-5, so the resampled labels routinely contain only the zero class and the gate could never draw the positive sign (0 positive draws across millions, reproduced). HistGradientBoostingClassifier honors sample_weight exactly, so the gate is now fit with sample_weight=weights directly, no bootstrap. Every training row is present, so every sign class the data contains survives into classes_. The weighted bootstrap stays for the QRF forests, which genuinely need it: quantile-forest uses sample_weight only as a >0 leaf mask (confirmed _quantile_forest.py:266), so it ignores weight magnitude and the resample is the only way to weight the leaf distributions. A guard now enforces internal consistency: if a sign class present in the training labels is absent from the fitted gate's classes_, the fit raises rather than silently drawing that class at probability zero. Regression tests (test_qrf.py): the reviewer's repro (n=5000, ~10 positive at weight 1, ~4990 zero at weight 50) keeps both gate classes and produces positive draws across seeds (was 0/2M); the consistency guard raises when a stubbed gate drops a training class. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ndpoints, chunking) Four changes to how the quantile forest is grown and read out, so draws reproduce the conditional's tail instead of undershooting it ~2x: 1. max_samples_leaf=None. The forests are now grown keeping ALL leaf samples; quantile-forest's default of 1 keeps one sample per leaf, thinning each row's conditional to ~n_estimators atoms and undershooting tail mass. Exposed as a RegimeGatedQRF param (default None). On the contract fixture the weighted share above 300k goes from ~0.0035 (nearest-snap, msl=1) to ~0.0050 — the weighted-population truth. 2. Linear interpolation. draw() no longer snaps each row to the nearest of 201 grid points (which quantizes every draw and biases the tail toward the bracket interior); it linearly interpolates the row's value at its exact quantile between the two bracketing grid quantiles. 3. Drawable extremes. The quantile grid now includes points adjacent to 0 and 1, so the observed conditional min and max are drawable. q=1 is the observed maximum, not extrapolation — the old comment wrongly excluded it. With a lone extreme the interior-only grid (top q=0.995) reads far below the max; the endpoint reaches it. 4. Chunked predict. draw() batches the predict over rows (_PREDICT_CHUNK_ROWS=50k) so the (n_rows x n_grid) matrix never materializes whole — at 3M+ rows it would be tens of GB. Chunking is bit-identical to a single pass (quantiles are drawn up front and sliced positionally). Regression tests (test_qrf.py): the weighted tail share above 300k is within ~2x of truth and materially closer than the nearest-snap baseline; a draw at q->1 reaches the observed conditional max via the grid endpoint, which the interior-only winsorized grid misses. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

NaN targets were silently relabeled to the zero class: the sign labels (y > atol / y < -atol) are both False for NaN, so a missing value was miscoded as a structural zero, NaN-blind. The model has no notion of missingness, so fit now validates at entry that every target column is entirely finite and raises a ValueError naming the offending column and its non-finite count (NaN or inf). Predictors are not checked — a forest splits around NaN features and a missing predictor is not silently miscoded the way a missing target is. Regression tests (test_qrf.py): a target with 3 NaNs raises naming the column and the count; an inf target is refused the same way. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tract Finding F: the fitted model was seeded with the raw model seed, making its draw uniforms bit-identical to the fit's bootstrap-selection uniforms (the draws were not independent of the fit's resampling). Seed fit and draw from two independent SeedSequence children of the model seed; determinism is preserved. Finding H: add a contract that the zero gate reproduces the *weighted* (population) zero-share, not the sample's, when the two differ. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…olumns Finding G: pin populace-frame>=0.1,<0.2 (the constellation must resolve, not fail only at the import-time compat gate), and refuse duplicate predictors, duplicate targets, or a column that is both predictor and target (these silently fit twice / fit P(y|y) before). Messages name the culprits. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ification review) Two independent verification reviews (mutation testing) found that the resolve_weights kind fix introduced two bugs in _inherited_kind, which only handled the weighted-group path while _effective_weights gives person-stored weights precedence when deriving a group entity: 1. Regression: weighted accounting (wsum/wmean/wquantile/gini/...) of a group-entity column on a person-only-weighted frame raised instead of deriving the group weights from the person weights — on a frame shape the fit suite's own fixtures build. 2. Silent kind mislabel: a third entity's resolved values (from the person source) were tagged with a sibling group's kind, which could leak through resolve_fit_weights as a kind-discipline violation. Make _inherited_kind recurse exactly as _effective_weights does so kind always names the source the values come from. Three regression tests: person-only group accounting, mixed-kind coherence, and a leaf-component pin for the tail-draw fix (max_samples_leaf=None was untested). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis and others added 3 commits June 10, 2026 10:02

MaxGhenis force-pushed the fit-kernel branch from 5bdd2ee to 8a2e8a1 Compare June 10, 2026 08:03

MaxGhenis marked this pull request as draft June 10, 2026 08:45

MaxGhenis and others added 8 commits June 10, 2026 11:48

MaxGhenis marked this pull request as ready for review June 10, 2026 14:49

MaxGhenis merged commit d4dc9df into main Jun 10, 2026
2 checks passed

MaxGhenis deleted the fit-kernel branch June 10, 2026 14:49

This was referenced Jun 14, 2026

populace-build: reference-pinned, recorded, reconstructable parity (part of #19) #21

Closed

EPIC: certified populace-us default correctness (gates not fully wired / reference rotted) #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

populace-fit: weight-aware conditional models (regime-gated chained QRF)#2

populace-fit: weight-aware conditional models (regime-gated chained QRF)#2
MaxGhenis merged 11 commits into
mainfrom
fit-kernel

MaxGhenis commented Jun 10, 2026 •

edited

Loading

Uh oh!

MaxGhenis commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's here

The headline contract

Note on the scikit-learn pin

Validation

Uh oh!

MaxGhenis commented Jun 10, 2026

Adversarial review — converting to draft; soundness fixes needed before merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MaxGhenis commented Jun 10, 2026 •

edited

Loading