Skip to content

Populace dashboard: populace-US as the sole dataset, live from HF, with version compare#15

Merged
PavelMakarchuk merged 14 commits into
mainfrom
populace-mode
Jun 16, 2026
Merged

Populace dashboard: populace-US as the sole dataset, live from HF, with version compare#15
PavelMakarchuk merged 14 commits into
mainfrom
populace-mode

Conversation

@PavelMakarchuk

@PavelMakarchuk PavelMakarchuk commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Reframes the dashboard around populace-US as PolicyEngine's main calibrated dataset. Microplex, us-data, and the eCPS comparison are removed; everything reads live from Hugging Face (no committed snapshot, no separate backend); and you can browse any release and diff versions.

What it is now

A single Next.js app over policyengine/populace-us. The current release resolves through latest.json; manifests and per-target calibration_diagnostics.json are fetched on demand (schema v1 and v2).

  • /populace — release summary (calibration loss + convergence, gates, within-10%/tolerance, records kept, per-family fit, worst-fit/biggest-improvement), for the latest or any ?release=.
  • /populace/targets — structured target browser: pick the quantity a constraint measures (e.g. adjusted gross income), drill its breakdown dimensions (income band × return type × filing status, geography, …) as filterable/sortable facets, and open any target's canonical detail (registry fields, source citation, initial→final→target).
  • /populace/compare — version-over-version diff: match targets by name across two releases, fit change per common target, added/removed counts, with a guard when surfaces differ.

API layer (the Next route handlers, all live-from-HF)

GET /api/populace/releases · GET /api/populace?release=<id> · GET /api/populace/target-diagnostics?release=<id>&… · GET /api/populace/compare?a=<id>&b=<id>

Removed

Microplex (mode/pages/components/backend), us-data (summary/analysis/targets/inventory/nodes/weights/pipeline + run/registry infra), the eCPS incumbent comparison, the dashboard mode selector, the committed data snapshot, and the FastAPI backend (the Next routes are the API layer).

Testing

bun test data-layer suite (parsing, v2 metadata, facet model, filtering, version compare) — 7 pass; tsc clean; production build green; live smoke-tested against HF (latest 457-target build and a pinned 3,704-target release; compare finds 289 common targets across surfaces).

🤖 Generated with Claude Code

A fourth dashboard mode reading the populace-US releases published on the
policyengine/populace-us Hugging Face dataset:

- /populace: release summary — loss vs enhanced CPS (train/holdout/full),
  acceptance gates with smoke aggregates, per-family loss breakdown, top
  regressions/improvements, release artifact links.
- /populace/targets: explorer over all 3,704 per-target diagnostics from
  sound_ecps_replacement_comparison.json (family/split/winner filters,
  search, sortable columns, pagination).

Next API routes fetch the manifests live from Hugging Face (release
discovery via the tree API) and serve per-target rows from a committed
snapshot of the 9f1260b release; a FastAPI mirror serves local dev. The
overview flags when the live release is newer than the snapshot.

Producer-side gaps are filed as PolicyEngine/populace#9 (latest.json
pointer), #10 (calibration diagnostics artifact), #11 (release contract),
with PRs 12-14 open against them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk requested a review from MaxGhenis June 12, 2026 05:18
PavelMakarchuk and others added 2 commits June 12, 2026 01:19
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Follows the populace#37 cleanup that retired the eCPS comparison from the
live populace surface (it moved to PolicyEngine/populace-benchmarks) and
added the latest.json pointer and calibration_diagnostics.json artifacts
this branch originally proposed.

- Resolve the current release through latest.json instead of guessing the
  lexicographically-latest release directory.
- Read per-target rows from calibration_diagnostics.json: populace's own
  calibration fit (target, initial vs final estimate, relative error,
  within-tolerance) rather than the populace-vs-enhanced-CPS comparison.
- Release summary: calibration loss + convergence sparkline, within-10%
  and within-tolerance, records kept after L0, acceptance gates, solver
  provenance, per-family fit, worst-fit and biggest-improvement tables,
  and a skipped-targets section.
- Target explorer: initial->final estimate, relative error, improvement,
  within-tolerance, with family/tolerance/direction filters. Derived
  families collapse per-state distribution and per-state-FIPS SNAP targets
  (19 families instead of 119).
- Snapshot refreshed to release populace-us-2024-f32c2e5-20260614.

Tested: 101 backend tests pass, tsc clean, next build green, live smoke
test resolves f32c2e5 via latest.json and serves both pages.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk changed the title Add Populace dashboard mode Add Populace dashboard mode (calibration diagnostics) Jun 15, 2026
Restores the populace-vs-enhanced-CPS comparison the pivot dropped, now as
a dedicated "Incumbent comparison" page sourced from the benchmarks repo's
scorecard rather than the live populace surface.

- /populace/comparison: full/holdout/train loss and unweighted MSRE head to
  head, per-target win/loss/tie, per-family loss breakdown, top regressions
  and improvements. Candidate is populace; baseline is the enhanced CPS.
- Route reads a live scorecard from POPULACE_BENCHMARKS_SCORECARD_URL when
  set, else serves a committed archived snapshot of the 9f1260b scorecard,
  flagged "archived". The normalizer flattens both the archived
  sound_ecps_replacement_comparison shape and the proposed flat benchmarks
  scorecard, so the live artifact drops in unchanged.
- The live scorecard is not published yet; filed as
  PolicyEngine/populace-benchmarks#3 (machine-readable scorecard + latest
  pointer), referenced in the route notes.

Tested: 105 backend tests pass (4 new: archived snapshot, live-prefer,
live-fetch fallback, FIPS family collapse), tsc clean, next build green.
Smoke-tested both the archived path and, with the env var set at build, the
live benchmarks path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk changed the title Add Populace dashboard mode (calibration diagnostics) Add Populace dashboard mode (calibration diagnostics + incumbent comparison) Jun 15, 2026
PavelMakarchuk and others added 6 commits June 15, 2026 11:01
Wires the comparison view to PolicyEngine/populace-benchmarks#3/#4: the route
resolves that repo's benchmarks/us/incumbent-comparison/latest.json pointer by
default, fetches the scorecard it names, and falls back to the committed
archived 9f1260b snapshot until the artifact is reachable. So the comparison
goes live automatically once the benchmarks PR merges, with no redeploy.

- POPULACE_BENCHMARKS_POINTER_URL overrides the pointer (e.g. a branch);
  POPULACE_BENCHMARKS_SCORECARD_URL still points straight at a scorecard.
- scorecard_status surfaces the artifact's own provenance ("archived" for the
  reconstructed 9f1260b scorecard), distinct from whether it was reached live.
- Fix the comparison normalizer to read the flat benchmarks summary
  (candidate_loss/baseline_loss/... directly in summary) as well as the nested
  archived shape — first-key-wins across both, mirroring the Python side. The
  flat scorecard was returning null losses before this.

Tested: 106 backend tests pass (pointer-resolution, direct-URL override, and
unreachable-fallback cases); tsc clean; next build green. Smoke-tested the
default snapshot path and the live pointer path against the pushed benchmarks
branch — both render full loss/win/family data and all three pages return 200.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Decompose target names into geography/level/source/variable/breakdown
  (matching populace.dev's parse_target); browse by variable ("the thing")
  then drill its breakdowns.
- Generic facet model: every axis a variable varies on (geography, level,
  each breakdown dimension) becomes a filterable, sortable column —
  income band x return type x filing status for AGI; geography for the
  state-conditioned real_estate_taxes; etc. Constant axes are dropped.
- Canonical single-target detail card (auto-opens when facets narrow to
  one): structured fields + initial->final->target bars + fit.
- Pull calibration_diagnostics.json live from HF via latest.json in both
  the Next routes and the FastAPI backend, with the committed snapshot as
  fallback. Handles schema v2 (no top-level release_id -> use the pointer).

113 backend tests pass; tsc clean; production build green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rison

populace-US is now the dashboard's only dataset. Removed:
- microplex mode (pages, components, API routes, backend router)
- us-data mode (summary/analysis/targets/inventory/nodes/weights/pipeline,
  their hooks, backend routes, run/registry/fixtures infrastructure)
- the eCPS incumbent comparison (the /populace/comparison view, the
  benchmarks scorecard wiring, and its backend endpoint) — version-over-
  version comparison replaces it

The dashboard mode selector and run/geo context are gone; populace is the
whole app (home redirects to /populace, nav is release · targets · compare).
The API client is a thin fetch wrapper (no run params, no fixtures). The
FastAPI backend is now just the populace router over Hugging Face.

16 backend tests pass; production build green; live smoke test OK.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Remove the committed snapshot entirely; latest-artifact.ts reads every
  release's manifests + calibration_diagnostics live from Hugging Face,
  resolved via latest.json (current) or by id (any release).
- New API: GET /api/populace/releases (lists releases from the HF tree,
  newest first, flagging which carry per-target diagnostics). /api/populace
  and /api/populace/target-diagnostics accept ?release=<id> to view any
  release; default is the latest.
- Adopt schema v2: surface the published registry metadata — source
  citation, entity, aggregation, measure, period — and parse target_name
  (dropping the @period suffix) instead of guessing from the slash name.
- Client always uses the Next API routes (same origin); the app no longer
  depends on the FastAPI backend.

tsc clean; live smoke test: releases list, latest (457 targets) and a
pinned older release (3704) both load; v2 citations flow through.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
New /populace/compare page and GET /api/populace/compare?a=&b=: load two
releases live from HF, match targets by base_name (period-stripped, so v1
and v2 releases align), and diff them — common targets get a fit change
(|rel err| B minus A; negative = B fits better), targets in only one
release are counted as added/removed. Surfaces that differ flag that their
loss values aren't comparable, steering to the per-target diff.

The compare view defaults B to the latest release and A to the prior one,
with release pickers from the new /api/populace/releases list.

tsc clean; production build green; live smoke test: f32c2e5 (3704) vs
0cdbb27 (457) finds 289 common targets, 89 improved / 200 regressed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Next.js API routes are now the complete, pure-HF API layer the app uses,
so the FastAPI backend (which still read the deleted snapshot files) is
removed along with the Python tooling and stale scope docs. Coverage moves
to a bun:test suite over the data layer — name/v2-metadata parsing, the
facet model, variable/dimension filtering, and the version comparison.

Makefile and README updated for the single-app layout.

bun test: 7 pass; tsc clean; production build green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk changed the title Add Populace dashboard mode (calibration diagnostics + incumbent comparison) Populace dashboard: populace-US as the sole dataset, live from HF, with version compare Jun 16, 2026
PavelMakarchuk and others added 4 commits June 15, 2026 23:05
- The target browser and release summary default to the latest release but
  gave no way to view another. Add a Release dropdown (from /api/populace/
  releases) on both pages, so you can switch to e.g. the 3704-target
  f32c2e5 surface instead of the 457-target latest. The status pill shows
  the active release and its target count.
- Facet dropdowns prepended a clear-filter option labeled "All", which
  collided with IRS filing status's literal "All" value (two "All"s).
  Rename the placeholder to "Any".

tsc clean; live: switching release reloads the surface (31 vars/457 vs
90 vars/3704).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The page-header placement was easy to miss, so the release selector now
sits directly above the variable list it controls, with a hint that the
newest build can be a small surface. Clarifies that the variable list is
per-release.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- IRS variables publish both a total (dollar amount) and a count (number
  of returns); the measure was buried as a dropped constant breakdown
  token, so amount and count rows looked like one variable. Fold the
  measure into variable_key so "capital gains gross · total" and "· count"
  are distinct browse entries, each tagged (amount/count) in the list and
  header.
- Humanize snake_case variable names (salt_deduction_expenditure -> SALT
  deduction expenditure) across the browser, table, detail card, and
  compare view, with a small acronym set; names that already use spaces
  pass through.
- Variable rows are now title + subtitle (name wraps as a block, source +
  measure beneath) so long names don't wrap raggedly. The Browse-by-
  variable list expands to fit instead of an inner scroll box.

bun test: 8 pass; tsc clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Release pickers (compare, summary, targets) show "<date> · <sha>"
  instead of the raw build id; the compare results echo the two releases
  with dates.
- The Browse-by-variable list is a capped (65vh) scroll box with a sticky
  header again, so it doesn't stretch to the height of the target table.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk marked this pull request as ready for review June 16, 2026 04:05
@PavelMakarchuk PavelMakarchuk merged commit 8bc3001 into main Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant