Fix PUF real-half QRF imputation by daphnehanse11 · Pull Request #1170 · PolicyEngine/policyengine-us-data

daphnehanse11 · 2026-06-10T22:31:42Z

Summary

keep the existing rich-preserving QRF draw for the zero-weight PUF clone half
add a separate PUF-weighted, income-conditioned QRF draw for positive-weight CPS records
use broader observed income/capital-gain/pension/social-security predictors for the real-half draw, while dropping any predictor that is itself being imputed
add unit coverage for separate real-vs-clone draws plus a raw-file charitable smoke test

Validation

uv run ruff check policyengine_us_data/calibration/puf_impute.py tests/unit/calibration/test_calibration_puf_impute.py validation/puf_qrf_real_half_smoke.py
uv run pytest tests/unit/calibration/test_calibration_puf_impute.py
uv run pytest tests/unit/test_puf_impute.py tests/unit/test_extended_cps.py tests/unit/calibration/test_retirement_imputation.py
uv run python -m compileall -q policyengine_us_data/calibration/puf_impute.py tests/unit/calibration/test_calibration_puf_impute.py validation/puf_qrf_real_half_smoke.py
uv run python validation/puf_qrf_real_half_smoke.py

Smoke result: raw PUF weighted combined charitable is $236.4B; old unweighted demographic-only QRF gives $15,523.1B (65.7x); fixed weighted income-conditioned QRF gives $357.0B (1.5x).

Notes

The smoke script is diagnostic, not release validation. It works from raw puf_2015.csv, demographics_2015.csv, and cps_2024.h5 because the local dataset-backed Microsimulation path currently fails on stale CPS schema/shape issues.
Full acceptance still needs a rebuilt enhanced CPS artifact and Daphne's 2026 dashboard/JCT comparison rerun.

Rebased onto current main and integrated with the Forbes/top-tail training exclusion that landed since the original branch point: - X_train_real now flows through the same non_forbes_mask filtering as X_train_full/X_train_override, so synthetic Forbes and metadata-missing top-tail donors stay out of the weighted real-half training set; the positive-weight filter moves below the Forbes block to keep the person-level mask index-aligned. - TestForbesTrainingExclusion fakes and assertions updated for the four sequential-QRF calls (clone full/override + real full/override). - Adds the towncrier fragment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

anth-volk · 2026-06-11T13:14:58Z

Rebased onto current main (f7458313) to resolve the merge conflicts, with one substantive integration fix: main's Forbes/top-tail training exclusion landed after this branch's base, and the new weighted real-half training frame (X_train_real) bypassed it. X_train_real now flows through the same non_forbes_mask filtering as the clone-half frames, with the positive-weight filter moved below the Forbes block so the person-level mask stays index-aligned.

Test integration:

TestForbesTrainingExclusion fakes now answer household_weight and accept the weight_col/max_train_samples kwargs; assertions cover all four sequential-QRF training frames (clone full/override + real full/override), so they now also prove Forbes donors stay out of the weighted real-half draw.
The new test_run_qrf_imputation_splits_weighted_real_half_from_clone_half fake gained a load_dataset() stub for main's Forbes metadata detection.
test_retirement_imputation.py's 2-tuple _run_qrf_imputation mocks are covered by the existing backwards-compat shim.
Added the towncrier fragment (changelog.d/1170.fixed.md).

Validation: ruff format/ruff check clean on the three touched files; tests/unit/calibration/test_calibration_puf_impute.py 33 passed; tests/unit/test_puf_impute.py + tests/unit/test_extended_cps.py + tests/unit/calibration/test_retirement_imputation.py 124 passed; scripts/run_quality_guards.py all passed.

🤖 Generated with Claude Code

Satisfies the PolicyEngine US freshness gate, which fails repo-wide while the lock pins 1.715.3. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

daphnehanse11 requested review from MaxGhenis and anth-volk June 11, 2026 00:40

anth-volk force-pushed the codex/rcc-puf-real-half-imputation branch from fa0a141 to 82cb3a4 Compare June 11, 2026 13:14

Refresh policyengine-us to 1.726.0

ef50bc9

Satisfies the PolicyEngine US freshness gate, which fails repo-wide while the lock pins 1.715.3. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis mentioned this pull request Jun 14, 2026

Port US PUF real-half and clone support diagnostics PolicyEngine/populace#47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PUF real-half QRF imputation#1170

Fix PUF real-half QRF imputation#1170
daphnehanse11 wants to merge 2 commits into
mainfrom
codex/rcc-puf-real-half-imputation

daphnehanse11 commented Jun 10, 2026

Uh oh!

anth-volk commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daphnehanse11 commented Jun 10, 2026

Summary

Validation

Notes

Uh oh!

anth-volk commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants