Skip to content

Accept local dataset paths in managed_microsimulation#416

Draft
MaxGhenis wants to merge 1 commit into
mainfrom
fix/managed-local-dataset
Draft

Accept local dataset paths in managed_microsimulation#416
MaxGhenis wants to merge 1 commit into
mainfrom
fix/managed-local-dataset

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Fixes #415

Summary

managed_microsimulation could not run a local dataset file — only managed dataset names and remote URIs (hf://, gs://). A local build artifact (a downstream pipeline's per-year Stage-output H5 that is not part of any release manifest) raised Unknown dataset, and allow_unmanaged=True only relaxed URIs, not local paths (file:// failed downstream with FileNotFoundError).

This makes resolve_managed_dataset_reference accept a local filesystem path when allow_unmanaged=True — the same explicit opt-in already required for unmanaged URIs. materialize_dataset_source already passes non-URI paths through unchanged, so the simulation constructs normally and records the provenance bundle (managed_by=policyengine.py). A local path without allow_unmanaged=True now raises an actionable error instead of the generic "Unknown dataset" message.

Why

Local build-and-score pipelines — for example projecting the certified base to future years and then scoring reforms on the resulting local H5s — had to construct policyengine_us.Microsimulation directly, bypassing the provenance recording and runtime-model pinning the managed path exists to enforce. They can now stay on the managed wrapper.

Tests

  • test__given_local_path__then_managed_resolution_requires_opt_in
  • test__given_local_path_with_opt_in__then_resolves_to_that_path
  • test__given_unknown_dataset_name__then_raises_unknown_dataset

Verified end-to-end: managed_microsimulation(dataset="<local>.h5", allow_unmanaged=True) constructs a working simulation and records the provenance bundle. Full test_release_manifests.py passes (33 tests); make lint clean.

🤖 Generated with Claude Code

resolve_managed_dataset_reference now treats a local filesystem path as an
unmanaged dataset when allow_unmanaged=True -- the same explicit opt-in already
required for remote URIs. materialize_dataset_source already passes non-URI
paths through unchanged, so the simulation constructs normally and records the
provenance bundle (managed_by=policyengine.py). Local build-and-score pipelines
can run their own Stage-output H5s through the managed wrapper instead of
constructing policyengine_us simulations directly. A local path without
allow_unmanaged=True now raises an actionable error rather than the generic
"Unknown dataset" message.

Fixes #415

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

managed_microsimulation cannot run a local dataset file

1 participant