populace-fit: weight-aware conditional models (regime-gated chained QRF)#2
Conversation
Add the packages/populace-fit shard skeleton: src-layout PEP 420 namespace (no src/populace/__init__.py), its own pyproject (deps populace-frame + scikit-learn + quantile-forest + numpy + pandas), and [tool.uv.sources] populace-frame = workspace so the workspace resolves it locally. scikit-learn is capped <1.9: 1.9 removed sklearn.tree._tree.DTYPE, which quantile-forest imports, breaking import on the only Python-3.14 wheel set. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d QRF model.py: the ConditionalModel/FittedModel protocols and resolve_fit_weights — the single authority enforcing that a fit is weighted by construction (weights reads the owning entity's typed Weights; weights='none' is the only unweighted path; a misspelled or mismatched kind raises rather than silently fitting unweighted). predictors_targets_entity refuses predictors/targets that span entities. qrf.py: the canonical model. Regime detection (structural/unweighted sign-support gates), sequential chaining (each target conditions on predictors plus the targets already drawn), and weighted bootstrap (importance-resample rows by weight before growing each forest — the microimpute#196 fix, reimplemented from scratch against the Frame, not imported). Draws sample the weighted conditional by querying the forest at a per-row seeded quantile. __init__.py: public API (ConditionalModel, QRF/RegimeGatedQRF, fit) and the constellation compatibility gate — asserts populace-frame's major/minor at import so a loose resolver cannot assemble an incompatible kernel pair. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d placeholder) test_weighted_fit_contract.py is the real realization of the kernel's skipped test_weighted_fit_shifts_draws_toward_weighted_truth: on a donor whose target is large exactly where weight is small (the #196 shape, ~20% low-weight huge-value rows), the weighted fit's draws land within 20% of the true weighted mean, weights='none' lands within 20% of the unweighted mean, and the two differ by >3x. Also asserts the default is weighted (no unweighted default). test_qrf.py: regime gates preserve a zero-inflated target's zero mass and both signs with no zero-crossing; chaining reproduces a cross-target correlation; predict row-count/index match the input; fixed seed is deterministic; successive predicts draw independently. test_model.py: weights='none' is the only unweighted path (a typo'd kind raises and names it; a mismatched kind raises and names the stored kind); predictors/targets spanning entities are refused. test_compat.py exercises the import gate. n=5000 seeded for CI speed; 35 tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adversarial review — converting to draft; soundness fixes needed before mergeAn independent clean-room review (every finding reproduced) found bugs in the operator's core purpose. Marking draft. Ranked: HIGH — the weighted-bootstrap gate annihilates rare low-weight classes. The gate is fit on an n-of-n weighted resample, so a scarce-but-oversampled stratum (exactly what the charter's pool design carries: oversampled in rows, downweighted in mass) gets zero rows in the resample → single-class gate → that class drawn with probability 0, forever. Reproduced: 9/10 gates single-class, 0 positive draws in 2M rows vs ~80 expected. The fit re-rarefies what the pool deliberately oversampled — the opposite of "tail support is strata's job." Fix: HIGH — household-weights-only frames can't be fit weighted at all. The canonical CPS shape (person-level fit, household design weights) fails HIGH — NaN targets silently become zeros in gated regimes (NaN-blind sign labels → zero class). Survey item-nonresponse silently moves mass to $0. Fix: validate finite at fit, raise naming the column + count. MEDIUM — tail mass undershot ~2x, and the contract test passes for the wrong reason ("delete all low-weight rows" also passes — it asserts only means, never that the rare regime survives). Plus the 201-point grid winsorizes draws to [0.5%, 99.5%]. This is the capital-gains/dividends tail problem at the method level. Fix: assert high-draw-share survival, draw leaf values as atoms incl. endpoints. MEDIUM — fit and draw RNG share one stream (predict quantiles bit-identical to the gate's bootstrap uniforms; max|diff|=0). Fix: LOW — missing Sound, must not regress: the weighted factorization P(sign|x)·P(y|x,sign); chained-equations semantics (drawn values fed forward); |
Add Frame.resolve_weights(entity) -> Weights: resolves effective weights like _effective_weights but returns a typed Weights that carries the source entity's kind. An entity without its own stored weights inherits the single weighted group entity's design/importance/calibrated kind and broadcast values; an entity with its own weights is returned as-is. The existing ambiguity guards (zero/multiple weighted group entities) are kept. This fixes the "household-weighted frame can't be fit weighted" bug: a person-level fit can now read the inherited household kind instead of a bare ndarray that dropped the kind. accounting._resolve migrates to resolve_weights(owner).values (behavior identical). Regression tests (test_bundle.py TestResolveWeights): person resolve on a household-weighted frame returns Weights(kind=design, broadcast values); calibrated household resolves to calibrated person; an entity with its own weights returns that exact object; ambiguity (two weighted group entities) still raises; unknown entity is named. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resolve_fit_weights now resolves via frame.resolve_weights(entity) rather than frame.weights_for(entity), so a person-level fit on a household-weighted frame inherits the household weights (and their kind) through membership instead of raising. This was the bug: the canonical CPS shape (person predictors/targets, design weights only on the household) could not be fit weighted at all. The kind discipline is unchanged — the requested kind must match the resolved (possibly inherited) kind, else raise. The impossible-remediation message is fixed: requesting "design" on a calibrated frame no longer advises "advance the frame's weights to design" (kinds only move forward, so that is impossible); it now tells the caller to pass weights="calibrated", the kind the frame actually carries. The forward direction (e.g. requesting calibrated on a design frame) keeps the advance-the-weights advice. Regression tests (test_model.py): the CPS shape fits weighted and the resolved vector broadcasts the household weights onto persons; a default design fit on a calibrated frame raises naming weights="calibrated" and not the impossible advance-to-design advice. The existing kind-mismatch test now matches "resolved weights" wording. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The sign-class gate (HistGradientBoostingClassifier) was fit on an n-of-n weighted bootstrap, which deletes vanishingly-rare low-weight classes outright: a positive row at weight 1 among thousands of zeros at weight 50 is drawn with probability ~4e-5, so the resampled labels routinely contain only the zero class and the gate could never draw the positive sign (0 positive draws across millions, reproduced). HistGradientBoostingClassifier honors sample_weight exactly, so the gate is now fit with sample_weight=weights directly, no bootstrap. Every training row is present, so every sign class the data contains survives into classes_. The weighted bootstrap stays for the QRF forests, which genuinely need it: quantile-forest uses sample_weight only as a >0 leaf mask (confirmed _quantile_forest.py:266), so it ignores weight magnitude and the resample is the only way to weight the leaf distributions. A guard now enforces internal consistency: if a sign class present in the training labels is absent from the fitted gate's classes_, the fit raises rather than silently drawing that class at probability zero. Regression tests (test_qrf.py): the reviewer's repro (n=5000, ~10 positive at weight 1, ~4990 zero at weight 50) keeps both gate classes and produces positive draws across seeds (was 0/2M); the consistency guard raises when a stubbed gate drops a training class. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ndpoints, chunking) Four changes to how the quantile forest is grown and read out, so draws reproduce the conditional's tail instead of undershooting it ~2x: 1. max_samples_leaf=None. The forests are now grown keeping ALL leaf samples; quantile-forest's default of 1 keeps one sample per leaf, thinning each row's conditional to ~n_estimators atoms and undershooting tail mass. Exposed as a RegimeGatedQRF param (default None). On the contract fixture the weighted share above 300k goes from ~0.0035 (nearest-snap, msl=1) to ~0.0050 — the weighted-population truth. 2. Linear interpolation. draw() no longer snaps each row to the nearest of 201 grid points (which quantizes every draw and biases the tail toward the bracket interior); it linearly interpolates the row's value at its exact quantile between the two bracketing grid quantiles. 3. Drawable extremes. The quantile grid now includes points adjacent to 0 and 1, so the observed conditional min and max are drawable. q=1 is the observed maximum, not extrapolation — the old comment wrongly excluded it. With a lone extreme the interior-only grid (top q=0.995) reads far below the max; the endpoint reaches it. 4. Chunked predict. draw() batches the predict over rows (_PREDICT_CHUNK_ROWS=50k) so the (n_rows x n_grid) matrix never materializes whole — at 3M+ rows it would be tens of GB. Chunking is bit-identical to a single pass (quantiles are drawn up front and sliced positionally). Regression tests (test_qrf.py): the weighted tail share above 300k is within ~2x of truth and materially closer than the nearest-snap baseline; a draw at q->1 reaches the observed conditional max via the grid endpoint, which the interior-only winsorized grid misses. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
NaN targets were silently relabeled to the zero class: the sign labels (y > atol / y < -atol) are both False for NaN, so a missing value was miscoded as a structural zero, NaN-blind. The model has no notion of missingness, so fit now validates at entry that every target column is entirely finite and raises a ValueError naming the offending column and its non-finite count (NaN or inf). Predictors are not checked — a forest splits around NaN features and a missing predictor is not silently miscoded the way a missing target is. Regression tests (test_qrf.py): a target with 3 NaNs raises naming the column and the count; an inf target is refused the same way. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tract Finding F: the fitted model was seeded with the raw model seed, making its draw uniforms bit-identical to the fit's bootstrap-selection uniforms (the draws were not independent of the fit's resampling). Seed fit and draw from two independent SeedSequence children of the model seed; determinism is preserved. Finding H: add a contract that the zero gate reproduces the *weighted* (population) zero-share, not the sample's, when the two differ. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…olumns Finding G: pin populace-frame>=0.1,<0.2 (the constellation must resolve, not fail only at the import-time compat gate), and refuse duplicate predictors, duplicate targets, or a column that is both predictor and target (these silently fit twice / fit P(y|y) before). Messages name the culprits. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ification review) Two independent verification reviews (mutation testing) found that the resolve_weights kind fix introduced two bugs in _inherited_kind, which only handled the weighted-group path while _effective_weights gives person-stored weights precedence when deriving a group entity: 1. Regression: weighted accounting (wsum/wmean/wquantile/gini/...) of a group-entity column on a person-only-weighted frame raised instead of deriving the group weights from the person weights — on a frame shape the fit suite's own fixtures build. 2. Silent kind mislabel: a third entity's resolved values (from the person source) were tagged with a sibling group's kind, which could leak through resolve_fit_weights as a kind-discipline violation. Make _inherited_kind recurse exactly as _effective_weights does so kind always names the source the values come from. Three regression tests: person-only group accounting, mixed-kind coherence, and a leaf-component pin for the tail-draw fix (max_samples_leaf=None was untested). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The conditional-models operator —
populace.fit, the second shard of the stack (afterpopulace-frame). PerDESIGN.md("populace-fit: conditional models").What's here
ConditionalModel/FittedModelprotocols (model.py) —fit(frame, predictors, targets, *, weights="design") -> FittedModel;FittedModel.predict(frame_or_df) -> DataFrame(one draw column per target). Weight-aware by construction:weightsselects which typed weight vector of the owning entity to use (default that entity's design weights), reading the Frame's typedWeightsrather than a raw array.weights="none"is the only way to fit unweighted, and a misspelled or mismatched kind raises (naming the culprit) instead of silently falling back to unweighted — closing the 2026-06 microimpute landmine at the type boundary.resolve_fit_weightsis the single authority for this rule.QRF/RegimeGatedQRF(qrf.py) — the canonical model: regime-gated (structural, unweighted sign-mixture gates), sequentially chained (each target conditions on the predictors plus the targets already drawn), quantile-regression-forest draws (a seeded per-row quantile), with the frame's weights materialized by weighted bootstrap (importance-resample the training rows by weight before growing each forest). Reimplemented from scratch against theFrame— it does not import microimpute. This is the microimpute#196 fix as the reference mechanism.__init__.py) — the charter's constellation-versioning mechanism:populace-frame's series is checked at import (pre-1.0, the 0.x minor; major from 1.0 on), so a loose resolver that ignores[tool.uv.sources]cannot silently assemble an incompatible kernel pair.The headline contract
test_weighted_fit_shifts_draws_toward_weighted_truthis the real realization of the placeholder the kernel left skipped inpackages/populace-frame/tests/test_contracts.py. On a donor whose target is large exactly where weight is small (the #196 shape; the high-value regime is independent of the predictors, so honoring the weight is the only way to recover the weighted conditional):weights="none"lands within 20% of the unweighted mean;This is the microimpute#196 bug class — now a standing guarantee of the stack rather than a latent footgun. Follow-up: the kernel's skipped
test_weighted_fit_shifts_draws_toward_weighted_truthplaceholder can be unskipped/removed once this shard is in the workspace (it could not be edited from this branch's scope).Plus: regime gates preserve a zero-inflated target's zero mass and both signs (no zero-crossing); chaining reproduces a cross-target correlation; predict row-count/index match the input; fixed seed is deterministic;
weights="none"is the only unweighted path. 35 tests,n=5000seeded for CI speed.Note on the scikit-learn pin
scikit-learnis capped>=1.5,<1.9. scikit-learn 1.9 removedsklearn.tree._tree.DTYPE, whichquantile-forestimports — so an unbounded>=1.5resolves to 1.9 andimport quantile_forestfails. On the workspace's Python 3.14 interpreter the cap keeps the only working combination (scikit-learn1.8 +quantile-forest1.4) resolvable; the cap can be lifted once quantile-forest tracks the 1.9 tree ABI.Optional heavy deps (
scikit-learn,quantile-forest) stay in this shard, never inpopulace-frame.Validation
uv sync --all-packages && uv run pytest packages/populace-fit && uv run ruff check packages/populace-fit— all green (35 passed; ruff clean). Full workspace suite: 192 passed, 3 skipped (the microunit/policyengine_us-gated kernel tests).🤖 Generated with Claude Code