Summary
The published US populace artifact exports person.ssi, even though ssi is formula-owned in PolicyEngine-US. When PolicyEngine loads the HDF5, USSingleYearDataset flattens every table column into inputs and Microsimulation sets any column that matches a PolicyEngine variable. That turns ssi from a computed output into a pinned input, so SSI reforms can fail to change final SSI.
Why this matters
This surfaced while rerunning SSI asset-limit analysis against policyengine/populace-us populace_us_2024.h5. The reform appeared to have zero cost because the published ssi column overrode the formula path.
A direct check confirmed the issue:
- with the published artifact as-is, the SSI reform cost was
0.0
- after clearing the loaded
ssi input arrays in the simulation, the same reform cost was about $8.7B
So this is not just an unused extra column; it changes reform behavior.
Expected behavior
Populace exports should not persist non-structural PolicyEngine variables that are owned by formulas. Those variables should be omitted so the rules engine calculates them from source inputs under baseline or reform policy.
A static formula_owned_excluded list is not enough: the export path should dynamically check the installed PolicyEngine variable registry and drop/fail any present formula-owned variable, while preserving structural ID and membership columns.
Observed affected US columns
In the published populace_us_2024.h5, the following non-structural exported columns are formula-owned under the current PE-US registry:
employment_income_last_year
has_itin
has_tin
household_size
in_nyc
is_adult
is_child
is_senior
spm_unit_capped_work_childcare_expenses
ssi
taxable_unemployment_compensation
traditional_ira_contributions
weeks_worked
The critical regression identified so far is ssi.
eCPS comparison
I checked local eCPS-style files as references. ssi was not present in either checked eCPS input file, so this specific bug is not caused by eCPS parity.
However, the published populace US artifact also contains many extra known PE variables relative to those eCPS files, which suggests the current export path does not enforce a closed eCPS input surface.
Related UK finding
A local UK populace candidate artifact has the same class of problem: non-structural formula-owned PE-UK variables were present in the saved H5. Examples include current_education, state_pension_reported, student_loan_repayments, and rail_subsidy_spending.
Fix direction
Add export-time guards in populace so that:
- formula-owned engine variables are dynamically identified from the live PolicyEngine registry and omitted/rejected before save;
- structural columns such as entity IDs and memberships are exempted;
- contracts can optionally enforce a closed allowed column surface for eCPS-style exports;
- regression tests cover
ssi specifically.
Summary
The published US populace artifact exports
person.ssi, even thoughssiis formula-owned in PolicyEngine-US. When PolicyEngine loads the HDF5,USSingleYearDatasetflattens every table column into inputs andMicrosimulationsets any column that matches a PolicyEngine variable. That turnsssifrom a computed output into a pinned input, so SSI reforms can fail to change final SSI.Why this matters
This surfaced while rerunning SSI asset-limit analysis against
policyengine/populace-uspopulace_us_2024.h5. The reform appeared to have zero cost because the publishedssicolumn overrode the formula path.A direct check confirmed the issue:
0.0ssiinput arrays in the simulation, the same reform cost was about$8.7BSo this is not just an unused extra column; it changes reform behavior.
Expected behavior
Populace exports should not persist non-structural PolicyEngine variables that are owned by formulas. Those variables should be omitted so the rules engine calculates them from source inputs under baseline or reform policy.
A static
formula_owned_excludedlist is not enough: the export path should dynamically check the installed PolicyEngine variable registry and drop/fail any present formula-owned variable, while preserving structural ID and membership columns.Observed affected US columns
In the published
populace_us_2024.h5, the following non-structural exported columns are formula-owned under the current PE-US registry:employment_income_last_yearhas_itinhas_tinhousehold_sizein_nycis_adultis_childis_seniorspm_unit_capped_work_childcare_expensesssitaxable_unemployment_compensationtraditional_ira_contributionsweeks_workedThe critical regression identified so far is
ssi.eCPS comparison
I checked local eCPS-style files as references.
ssiwas not present in either checked eCPS input file, so this specific bug is not caused by eCPS parity.However, the published populace US artifact also contains many extra known PE variables relative to those eCPS files, which suggests the current export path does not enforce a closed eCPS input surface.
Related UK finding
A local UK populace candidate artifact has the same class of problem: non-structural formula-owned PE-UK variables were present in the saved H5. Examples include
current_education,state_pension_reported,student_loan_repayments, andrail_subsidy_spending.Fix direction
Add export-time guards in populace so that:
ssispecifically.