You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The certified populace_us_2024.h5 exports the PolicyEngine-US input variable person.race as raw CPS numeric codes (0, 1, 10, 11, 12, 13, …) for all 160,858 person rows, rather than the Race enum member names PolicyEngine expects (WHITE, BLACK, HISPANIC, OTHER). Every value is out of the enum's domain.
This is the same gate-escape class as the EPIC (#28): exported_nonzero passes (the codes are non-zero) and parity doesn't look at value domains, so nothing in the suite catches that an exported enum variable holds values PolicyEngine-US cannot interpret.
Reproduction
Loading the sha-verified certified artifact and checking every exported column that maps to a PE variable with possible_values:
[person] race: invalid ['0','1','10','11','12','13', …] on 160858/160858 rows;
valid enum names e.g. ['BLACK','HISPANIC','OTHER','WHITE']
race is the only enum-typed input affected (state_code, filing status, etc. are in-domain). The build maps a raw cps_race numeric (packages/populace-build/src/populace/build/us/sources.py) but never converts it to the Race enum on export.
Impact
Numerical impact is low for race specifically — it is not a federal tax/benefit driver, so the certified aggregates are unaffected.
Interop impact is real: a downstream consumer loading the artifact into Microsimulation gets out-of-domain enum values and must coerce them. The CRFB taxation-of-benefits build had to add a sanitize_enum_inputs step that rewrote all 160,858 race values to the enum default before it could simulate. Any consumer hits this.
The gate gap is the bigger issue: nothing validates exported enum domains, so a future enum input that does matter (anything tax/benefit-relevant) could ship as raw codes and pass certification the same way.
Fix
Map cps_race to the PolicyEngine-US Race enum at the export step so race ships valid member names.
Distinct from #27 (drop raw/scratch columns — race is a legitimate PE input to keep, just with correct values) and from #24/#25 (formula-owned over-export — race is a pure input). Belongs under the EPIC #28 "gates not fully wired" theme; the enum_domain gate should run in the re-certification gate suite.
Summary
The certified
populace_us_2024.h5exports the PolicyEngine-US input variableperson.raceas raw CPS numeric codes (0, 1, 10, 11, 12, 13, …) for all 160,858 person rows, rather than theRaceenum member names PolicyEngine expects (WHITE,BLACK,HISPANIC,OTHER). Every value is out of the enum's domain.This is the same gate-escape class as the EPIC (#28):
exported_nonzeropasses (the codes are non-zero) andparitydoesn't look at value domains, so nothing in the suite catches that an exported enum variable holds values PolicyEngine-US cannot interpret.Reproduction
Loading the sha-verified certified artifact and checking every exported column that maps to a PE variable with
possible_values:raceis the only enum-typed input affected (state_code, filing status, etc. are in-domain). The build maps a rawcps_racenumeric (packages/populace-build/src/populace/build/us/sources.py) but never converts it to theRaceenum on export.Impact
racespecifically — it is not a federal tax/benefit driver, so the certified aggregates are unaffected.Microsimulationgets out-of-domain enum values and must coerce them. The CRFB taxation-of-benefits build had to add asanitize_enum_inputsstep that rewrote all 160,858racevalues to the enum default before it could simulate. Any consumer hits this.Fix
cps_raceto the PolicyEngine-USRaceenum at the export step soraceships valid member names.enum_domaingate topopulace.build.gates(and wire it into the build per EPIC: certified populace-us default correctness (gates not fully wired / reference rotted) #28): for every exported column whose PE variable haspossible_values, assert all stored values are valid enum members. This catches theracedefect and any future enum-as-codes regression.Relationship to existing issues
Distinct from #27 (drop raw/scratch columns —
raceis a legitimate PE input to keep, just with correct values) and from #24/#25 (formula-owned over-export —raceis a pure input). Belongs under the EPIC #28 "gates not fully wired" theme; theenum_domaingate should run in the re-certification gate suite.