Problem
policyengine_us/tools/geography/county_helpers.py::load_county_fips_dataset() treats the county FIPS table as a live runtime download. When data/county_fips_2020.csv.gz is absent, it downloads county_fips_2020.csv.gz from the policyengine/policyengine-us-data Hugging Face repo.
That makes baseline household tests and county calculations depend on live network access. A fresh CI runner, fresh install, or offline environment can fail on unrelated policy tests if Hugging Face is unavailable, slow, rate-limited, or otherwise unreachable.
PR #8307 surfaced this in Full Suite - Baseline (irs-household) after merging main. The failing policy test was:
policyengine_us/tests/policy/baseline/household/demographic/geographic/county/county.yaml
Failing job from PR #8307: https://github.com/PolicyEngine/policyengine-us/actions/runs/27633741706/job/81715292464
Local reproduction
Running the baseline household batch locally reproduced the failure when the dataset download was unavailable:
PYTHONPATH=. python policyengine_us/tests/test_batched.py policyengine_us/tests/policy/baseline/household --batches 2
The first batch failed in the county FIPS YAML tests because the dataset could not be downloaded.
Preferred fix
Make county FIPS reference data a packaged, versioned resource rather than a default live download.
Suggested implementation:
- Store the county FIPS table inside the package, for example under
policyengine_us/tools/geography/data/county_fips_2020.csv.
- Load it with
importlib.resources, not Path("data"), so it works from wheels and installed packages.
- Keep the Hugging Face download only as a dev/update fallback or dataset-refresh path, not the ordinary runtime path.
- Update ordinary CI tests to validate the packaged resource and county mapping behavior without network access.
- Move any live Hugging Face download coverage behind an explicit integration/network test flag.
Less complete alternative
CI could prefetch data/county_fips_2020.csv.gz before running the baseline household suite, but that only makes CI less flaky. It would still leave installed/runtime behavior dependent on an undeclared external network fetch.
A local vendored CSV copy fixed the failing shard during PR #8307 debugging, but that broader change was intentionally left out of the Tennessee property tax relief PR to keep that PR scoped.
Problem
policyengine_us/tools/geography/county_helpers.py::load_county_fips_dataset()treats the county FIPS table as a live runtime download. Whendata/county_fips_2020.csv.gzis absent, it downloadscounty_fips_2020.csv.gzfrom thepolicyengine/policyengine-us-dataHugging Face repo.That makes baseline household tests and county calculations depend on live network access. A fresh CI runner, fresh install, or offline environment can fail on unrelated policy tests if Hugging Face is unavailable, slow, rate-limited, or otherwise unreachable.
PR #8307 surfaced this in
Full Suite - Baseline (irs-household)after mergingmain. The failing policy test was:Failing job from PR #8307: https://github.com/PolicyEngine/policyengine-us/actions/runs/27633741706/job/81715292464
Local reproduction
Running the baseline household batch locally reproduced the failure when the dataset download was unavailable:
The first batch failed in the county FIPS YAML tests because the dataset could not be downloaded.
Preferred fix
Make county FIPS reference data a packaged, versioned resource rather than a default live download.
Suggested implementation:
policyengine_us/tools/geography/data/county_fips_2020.csv.importlib.resources, notPath("data"), so it works from wheels and installed packages.Less complete alternative
CI could prefetch
data/county_fips_2020.csv.gzbefore running the baseline household suite, but that only makes CI less flaky. It would still leave installed/runtime behavior dependent on an undeclared external network fetch.A local vendored CSV copy fixed the failing shard during PR #8307 debugging, but that broader change was intentionally left out of the Tennessee property tax relief PR to keep that PR scoped.