Skip to content

Certified populace-us persists 18 formula-owned variables eCPS doesn't (formula_owned_export gate never wired into the build) #24

@MaxGhenis

Description

@MaxGhenis

Summary

The certified US default (populace_us_2024.h5, sha f32c2e5e…, bundle 4.16.2) persists 18 formula-owned variables that enhanced CPS does not store — variables the engine computes via formula. Because PolicyEngine treats every stored column as a simulation input that overrides the formula, these frozen values mask the engine's own computation, including under reform. The formula_owned_export gate that exists precisely to prevent this was never wired into the build.

This is distinct from the parity-drift in PolicyEngine/populace-benchmarks#1/#20 (missing layers). This is the opposite failure: over-export that masks formulas.

Evidence

  • formula_owned_export_gate is defined in populace.build.gates but a grep of the build pipeline from git history (build_dataset.py, build_populace_us_dataset.py, enrich_artifact.py, check_parity.py at commit 33ed83d) finds zero invocations. Only exported_nonzero and parity ran.
  • Cross-checking the certified artifact's stored columns against policyengine-us variable definitions: 22 stored columns are formula-owned; 18 of those are stored by populace but NOT by eCPS (eCPS lets the engine compute them). The other 4 (person_id, self_employed_pension_contribution_ald, spm_unit_capped_work_childcare_expenses, traditional_ira_contributions) are also stored in eCPS, so they're parity-consistent and out of scope here.

The 18 (formula-owned, stored by populace, not by eCPS)

Clear computed-output masking — engine benefit/tax outputs frozen:
ssi (the gate docstring's own example), social_security, taxable_unemployment_compensation, taxable_pension_income, tax_exempt_pension_income

Income stored under the computed name — should persist the underlying input instead (e.g. employment_income_before_lsr), or the frozen value defeats labor-supply-response in dynamic scoring:
employment_income, self_employment_income, dividend_income, long_term_capital_gains, employment_income_last_year, weeks_worked

Derived demographic/geographic flags — engine derives from age/geo; redundant + masking, lower harm:
is_adult, is_child, is_senior, household_size, in_nyc, has_itin, has_tin

Impact

For each masked variable, the certified data uses the frozen value rather than recomputing. Concretely: a UI-taxability reform won't move taxable_unemployment_compensation; an SSI reform won't move ssi; the income family likely freezes the post-LSR value so dynamic scoring can't adjust it. Baseline values may also diverge from an eCPS-based run wherever the frozen value differs from what the formula would produce. The exported_nonzero and parity gates cannot catch this — only formula_owned_export can, and it never ran.

Fix

  • Wire formula_owned_export_gate into the build as a release-blocking gate (it exists; it was never invoked). Feed it the canonical formula-owned set (the principled definition that reproduces this list: variables with an engine formula that eCPS does not persist) and an explicit, documented structural_columns allow-list for IDs/keys.
  • Drop the masked columns from the export and rebuild — or, for the income family, persist the correct underlying input (*_before_lsr / source) rather than the computed name.
  • Confirm the triage above against the canonical list once the gate is wired; promote/demote the "derived flags" group per the gate's structural allow-list.
  • Relevant to Is the populace-us ↔ current-eCPS parity drift blocking downstream (CRFB taxation-of-benefits, PolicyBench)? populace-benchmarks#2: masking formula-owned variables breaks reform-responsiveness for those variables, which the blocking assessment should weigh alongside the missing-layer gaps.

Related: PolicyEngine/populace-benchmarks#1 (pin parity reference + restore the deleted gate runner), PolicyEngine/populace-benchmarks#2 (downstream blocking assessment).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions