You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 14, 2026. It is now read-only.
A clean-main workstation smoke run failed before writing any artifacts/local_us_microplex_smoke/local-smoke-v1/ checkpoint when the pipeline attempted to download a SIPP donor asset from Hugging Face via Xet and the local disk filled.
The run had to add --no-include-acs because policyengine-us-data does not provide storage/acs_2022.h5 locally and the ACS source has no download URL.
Failure excerpt:
RuntimeError: Data processing error: File reconstruction error: IO Error: No space left on device (os error 28)
...
File "microplex_us/data_sources/donor_surveys.py", line 676, in _download_policyengine_us_data_file
downloaded = hf_hub_download(
...
Loading processed CPS ASEC 2023 from /Users/administrator/.cache/microplex/cps_asec_2023_processed_v20260601_ecps_spm_takeup_inputs.parquet
Loading PUF from /Users/administrator/.cache/microplex/puf_2015.csv...
Raw records: 207,692
Loading demographics from /Users/administrator/.cache/microplex/demographics_2015.csv...
After demographics merge: 207,692
Expanded 1,000 tax units to 1,921 persons
Observed behavior:
The failure occurs before any durable smoke output/checkpoint appears under artifacts/local_us_microplex_smoke/local-smoke-v1/.
The Xet log showed successful reconstruction of one donor file shortly before the failure, then the next donor download exhausted disk.
Because no checkpoint exists, the next retry cannot resume from a completed stage and must restart the pre-checkpoint source-loading/imputation work.
Potential improvements:
Preflight available disk space for Hugging Face/cache/download directories before beginning source loading.
Emit which donor source/file is being downloaded before invoking hf_hub_download.
Consider making source-loading checkpointing more granular so large donor downloads do not require restarting the whole pre-checkpoint phase after local environment failures.
A clean-main workstation smoke run failed before writing any
artifacts/local_us_microplex_smoke/local-smoke-v1/checkpoint when the pipeline attempted to download a SIPP donor asset from Hugging Face via Xet and the local disk filled.Command shape:
The run had to add
--no-include-acsbecausepolicyengine-us-datadoes not providestorage/acs_2022.h5locally and the ACS source has no download URL.Failure excerpt:
Observed behavior:
artifacts/local_us_microplex_smoke/local-smoke-v1/.Potential improvements:
hf_hub_download.