Summary
The certified US default (populace_us_2024.h5, sha f32c2e5e…, bundle 4.16.2) persists 18 formula-owned variables that enhanced CPS does not store — variables the engine computes via formula. Because PolicyEngine treats every stored column as a simulation input that overrides the formula, these frozen values mask the engine's own computation, including under reform. The formula_owned_export gate that exists precisely to prevent this was never wired into the build.
This is distinct from the parity-drift in PolicyEngine/populace-benchmarks#1/#20 (missing layers). This is the opposite failure: over-export that masks formulas.
Evidence
formula_owned_export_gate is defined in populace.build.gates but a grep of the build pipeline from git history (build_dataset.py, build_populace_us_dataset.py, enrich_artifact.py, check_parity.py at commit 33ed83d) finds zero invocations. Only exported_nonzero and parity ran.
- Cross-checking the certified artifact's stored columns against policyengine-us variable definitions: 22 stored columns are formula-owned; 18 of those are stored by populace but NOT by eCPS (eCPS lets the engine compute them). The other 4 (
person_id, self_employed_pension_contribution_ald, spm_unit_capped_work_childcare_expenses, traditional_ira_contributions) are also stored in eCPS, so they're parity-consistent and out of scope here.
The 18 (formula-owned, stored by populace, not by eCPS)
Clear computed-output masking — engine benefit/tax outputs frozen:
ssi (the gate docstring's own example), social_security, taxable_unemployment_compensation, taxable_pension_income, tax_exempt_pension_income
Income stored under the computed name — should persist the underlying input instead (e.g. employment_income_before_lsr), or the frozen value defeats labor-supply-response in dynamic scoring:
employment_income, self_employment_income, dividend_income, long_term_capital_gains, employment_income_last_year, weeks_worked
Derived demographic/geographic flags — engine derives from age/geo; redundant + masking, lower harm:
is_adult, is_child, is_senior, household_size, in_nyc, has_itin, has_tin
Impact
For each masked variable, the certified data uses the frozen value rather than recomputing. Concretely: a UI-taxability reform won't move taxable_unemployment_compensation; an SSI reform won't move ssi; the income family likely freezes the post-LSR value so dynamic scoring can't adjust it. Baseline values may also diverge from an eCPS-based run wherever the frozen value differs from what the formula would produce. The exported_nonzero and parity gates cannot catch this — only formula_owned_export can, and it never ran.
Fix
Related: PolicyEngine/populace-benchmarks#1 (pin parity reference + restore the deleted gate runner), PolicyEngine/populace-benchmarks#2 (downstream blocking assessment).
Summary
The certified US default (
populace_us_2024.h5, shaf32c2e5e…, bundle 4.16.2) persists 18 formula-owned variables that enhanced CPS does not store — variables the engine computes via formula. Because PolicyEngine treats every stored column as a simulation input that overrides the formula, these frozen values mask the engine's own computation, including under reform. Theformula_owned_exportgate that exists precisely to prevent this was never wired into the build.This is distinct from the parity-drift in PolicyEngine/populace-benchmarks#1/#20 (missing layers). This is the opposite failure: over-export that masks formulas.
Evidence
formula_owned_export_gateis defined inpopulace.build.gatesbut a grep of the build pipeline from git history (build_dataset.py,build_populace_us_dataset.py,enrich_artifact.py,check_parity.pyat commit33ed83d) finds zero invocations. Onlyexported_nonzeroandparityran.person_id,self_employed_pension_contribution_ald,spm_unit_capped_work_childcare_expenses,traditional_ira_contributions) are also stored in eCPS, so they're parity-consistent and out of scope here.The 18 (formula-owned, stored by populace, not by eCPS)
Clear computed-output masking — engine benefit/tax outputs frozen:
ssi(the gate docstring's own example),social_security,taxable_unemployment_compensation,taxable_pension_income,tax_exempt_pension_incomeIncome stored under the computed name — should persist the underlying input instead (e.g.
employment_income_before_lsr), or the frozen value defeats labor-supply-response in dynamic scoring:employment_income,self_employment_income,dividend_income,long_term_capital_gains,employment_income_last_year,weeks_workedDerived demographic/geographic flags — engine derives from age/geo; redundant + masking, lower harm:
is_adult,is_child,is_senior,household_size,in_nyc,has_itin,has_tinImpact
For each masked variable, the certified data uses the frozen value rather than recomputing. Concretely: a UI-taxability reform won't move
taxable_unemployment_compensation; an SSI reform won't movessi; the income family likely freezes the post-LSR value so dynamic scoring can't adjust it. Baseline values may also diverge from an eCPS-based run wherever the frozen value differs from what the formula would produce. Theexported_nonzeroandparitygates cannot catch this — onlyformula_owned_exportcan, and it never ran.Fix
formula_owned_export_gateinto the build as a release-blocking gate (it exists; it was never invoked). Feed it the canonical formula-owned set (the principled definition that reproduces this list: variables with an engine formula that eCPS does not persist) and an explicit, documentedstructural_columnsallow-list for IDs/keys.*_before_lsr/ source) rather than the computed name.Related: PolicyEngine/populace-benchmarks#1 (pin parity reference + restore the deleted gate runner), PolicyEngine/populace-benchmarks#2 (downstream blocking assessment).