Populace dashboard: populace-US as the sole dataset, live from HF, with version compare by PavelMakarchuk · Pull Request #15 · PolicyEngine/calibration-diagnostics

PavelMakarchuk · 2026-06-12T05:18:19Z

Reframes the dashboard around populace-US as PolicyEngine's main calibrated dataset. Microplex, us-data, and the eCPS comparison are removed; everything reads live from Hugging Face (no committed snapshot, no separate backend); and you can browse any release and diff versions.

What it is now

A single Next.js app over policyengine/populace-us. The current release resolves through latest.json; manifests and per-target calibration_diagnostics.json are fetched on demand (schema v1 and v2).

/populace — release summary (calibration loss + convergence, gates, within-10%/tolerance, records kept, per-family fit, worst-fit/biggest-improvement), for the latest or any ?release=.
/populace/targets — structured target browser: pick the quantity a constraint measures (e.g. adjusted gross income), drill its breakdown dimensions (income band × return type × filing status, geography, …) as filterable/sortable facets, and open any target's canonical detail (registry fields, source citation, initial→final→target).
/populace/compare — version-over-version diff: match targets by name across two releases, fit change per common target, added/removed counts, with a guard when surfaces differ.

API layer (the Next route handlers, all live-from-HF)

GET /api/populace/releases · GET /api/populace?release=<id> · GET /api/populace/target-diagnostics?release=<id>&… · GET /api/populace/compare?a=<id>&b=<id>

Removed

Microplex (mode/pages/components/backend), us-data (summary/analysis/targets/inventory/nodes/weights/pipeline + run/registry infra), the eCPS incumbent comparison, the dashboard mode selector, the committed data snapshot, and the FastAPI backend (the Next routes are the API layer).

Testing

bun test data-layer suite (parsing, v2 metadata, facet model, filtering, version compare) — 7 pass; tsc clean; production build green; live smoke-tested against HF (latest 457-target build and a pinned 3,704-target release; compare finds 289 common targets across surfaces).

🤖 Generated with Claude Code

A fourth dashboard mode reading the populace-US releases published on the policyengine/populace-us Hugging Face dataset: - /populace: release summary — loss vs enhanced CPS (train/holdout/full), acceptance gates with smoke aggregates, per-family loss breakdown, top regressions/improvements, release artifact links. - /populace/targets: explorer over all 3,704 per-target diagnostics from sound_ecps_replacement_comparison.json (family/split/winner filters, search, sortable columns, pagination). Next API routes fetch the manifests live from Hugging Face (release discovery via the tree API) and serve per-target rows from a committed snapshot of the 9f1260b release; a FastAPI mirror serves local dev. The overview flags when the live release is newer than the snapshot. Producer-side gaps are filed as PolicyEngine/populace#9 (latest.json pointer), #10 (calibration diagnostics artifact), #11 (release contract), with PRs 12-14 open against them. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Follows the populace#37 cleanup that retired the eCPS comparison from the live populace surface (it moved to PolicyEngine/populace-benchmarks) and added the latest.json pointer and calibration_diagnostics.json artifacts this branch originally proposed. - Resolve the current release through latest.json instead of guessing the lexicographically-latest release directory. - Read per-target rows from calibration_diagnostics.json: populace's own calibration fit (target, initial vs final estimate, relative error, within-tolerance) rather than the populace-vs-enhanced-CPS comparison. - Release summary: calibration loss + convergence sparkline, within-10% and within-tolerance, records kept after L0, acceptance gates, solver provenance, per-family fit, worst-fit and biggest-improvement tables, and a skipped-targets section. - Target explorer: initial->final estimate, relative error, improvement, within-tolerance, with family/tolerance/direction filters. Derived families collapse per-state distribution and per-state-FIPS SNAP targets (19 families instead of 119). - Snapshot refreshed to release populace-us-2024-f32c2e5-20260614. Tested: 101 backend tests pass, tsc clean, next build green, live smoke test resolves f32c2e5 via latest.json and serves both pages. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Restores the populace-vs-enhanced-CPS comparison the pivot dropped, now as a dedicated "Incumbent comparison" page sourced from the benchmarks repo's scorecard rather than the live populace surface. - /populace/comparison: full/holdout/train loss and unweighted MSRE head to head, per-target win/loss/tie, per-family loss breakdown, top regressions and improvements. Candidate is populace; baseline is the enhanced CPS. - Route reads a live scorecard from POPULACE_BENCHMARKS_SCORECARD_URL when set, else serves a committed archived snapshot of the 9f1260b scorecard, flagged "archived". The normalizer flattens both the archived sound_ecps_replacement_comparison shape and the proposed flat benchmarks scorecard, so the live artifact drops in unchanged. - The live scorecard is not published yet; filed as PolicyEngine/populace-benchmarks#3 (machine-readable scorecard + latest pointer), referenced in the route notes. Tested: 105 backend tests pass (4 new: archived snapshot, live-prefer, live-fetch fallback, FIPS family collapse), tsc clean, next build green. Smoke-tested both the archived path and, with the env var set at build, the live benchmarks path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Wires the comparison view to PolicyEngine/populace-benchmarks#3/#4: the route resolves that repo's benchmarks/us/incumbent-comparison/latest.json pointer by default, fetches the scorecard it names, and falls back to the committed archived 9f1260b snapshot until the artifact is reachable. So the comparison goes live automatically once the benchmarks PR merges, with no redeploy. - POPULACE_BENCHMARKS_POINTER_URL overrides the pointer (e.g. a branch); POPULACE_BENCHMARKS_SCORECARD_URL still points straight at a scorecard. - scorecard_status surfaces the artifact's own provenance ("archived" for the reconstructed 9f1260b scorecard), distinct from whether it was reached live. - Fix the comparison normalizer to read the flat benchmarks summary (candidate_loss/baseline_loss/... directly in summary) as well as the nested archived shape — first-key-wins across both, mirroring the Python side. The flat scorecard was returning null losses before this. Tested: 106 backend tests pass (pointer-resolution, direct-URL override, and unreachable-fallback cases); tsc clean; next build green. Smoke-tested the default snapshot path and the live pointer path against the pushed benchmarks branch — both render full loss/win/family data and all three pages return 200. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- Decompose target names into geography/level/source/variable/breakdown (matching populace.dev's parse_target); browse by variable ("the thing") then drill its breakdowns. - Generic facet model: every axis a variable varies on (geography, level, each breakdown dimension) becomes a filterable, sortable column — income band x return type x filing status for AGI; geography for the state-conditioned real_estate_taxes; etc. Constant axes are dropped. - Canonical single-target detail card (auto-opens when facets narrow to one): structured fields + initial->final->target bars + fit. - Pull calibration_diagnostics.json live from HF via latest.json in both the Next routes and the FastAPI backend, with the committed snapshot as fallback. Handles schema v2 (no top-level release_id -> use the pointer). 113 backend tests pass; tsc clean; production build green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…rison populace-US is now the dashboard's only dataset. Removed: - microplex mode (pages, components, API routes, backend router) - us-data mode (summary/analysis/targets/inventory/nodes/weights/pipeline, their hooks, backend routes, run/registry/fixtures infrastructure) - the eCPS incumbent comparison (the /populace/comparison view, the benchmarks scorecard wiring, and its backend endpoint) — version-over- version comparison replaces it The dashboard mode selector and run/geo context are gone; populace is the whole app (home redirects to /populace, nav is release · targets · compare). The API client is a thin fetch wrapper (no run params, no fixtures). The FastAPI backend is now just the populace router over Hugging Face. 16 backend tests pass; production build green; live smoke test OK. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@period

- Remove the committed snapshot entirely; latest-artifact.ts reads every release's manifests + calibration_diagnostics live from Hugging Face, resolved via latest.json (current) or by id (any release). - New API: GET /api/populace/releases (lists releases from the HF tree, newest first, flagging which carry per-target diagnostics). /api/populace and /api/populace/target-diagnostics accept ?release=<id> to view any release; default is the latest. - Adopt schema v2: surface the published registry metadata — source citation, entity, aggregation, measure, period — and parse target_name (dropping the @period suffix) instead of guessing from the slash name. - Client always uses the Next API routes (same origin); the app no longer depends on the FastAPI backend. tsc clean; live smoke test: releases list, latest (457 targets) and a pinned older release (3704) both load; v2 citations flow through. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

New /populace/compare page and GET /api/populace/compare?a=&b=: load two releases live from HF, match targets by base_name (period-stripped, so v1 and v2 releases align), and diff them — common targets get a fit change (|rel err| B minus A; negative = B fits better), targets in only one release are counted as added/removed. Surfaces that differ flag that their loss values aren't comparable, steering to the per-target diff. The compare view defaults B to the latest release and A to the prior one, with release pickers from the new /api/populace/releases list. tsc clean; production build green; live smoke test: f32c2e5 (3704) vs 0cdbb27 (457) finds 289 common targets, 89 improved / 200 regressed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The Next.js API routes are now the complete, pure-HF API layer the app uses, so the FastAPI backend (which still read the deleted snapshot files) is removed along with the Python tooling and stale scope docs. Coverage moves to a bun:test suite over the data layer — name/v2-metadata parsing, the facet model, variable/dimension filtering, and the version comparison. Makefile and README updated for the single-app layout. bun test: 7 pass; tsc clean; production build green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- The target browser and release summary default to the latest release but gave no way to view another. Add a Release dropdown (from /api/populace/ releases) on both pages, so you can switch to e.g. the 3704-target f32c2e5 surface instead of the 457-target latest. The status pill shows the active release and its target count. - Facet dropdowns prepended a clear-filter option labeled "All", which collided with IRS filing status's literal "All" value (two "All"s). Rename the placeholder to "Any". tsc clean; live: switching release reloads the surface (31 vars/457 vs 90 vars/3704). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The page-header placement was easy to miss, so the release selector now sits directly above the variable list it controls, with a hint that the newest build can be a small surface. Clarifies that the variable list is per-release. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- IRS variables publish both a total (dollar amount) and a count (number of returns); the measure was buried as a dropped constant breakdown token, so amount and count rows looked like one variable. Fold the measure into variable_key so "capital gains gross · total" and "· count" are distinct browse entries, each tagged (amount/count) in the list and header. - Humanize snake_case variable names (salt_deduction_expenditure -> SALT deduction expenditure) across the browser, table, detail card, and compare view, with a small acronym set; names that already use spaces pass through. - Variable rows are now title + subtitle (name wraps as a block, source + measure beneath) so long names don't wrap raggedly. The Browse-by- variable list expands to fit instead of an inner scroll box. bun test: 8 pass; tsc clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- Release pickers (compare, summary, targets) show "<date> · <sha>" instead of the raw build id; the compare results echo the two releases with dates. - The Browse-by-variable list is a capped (65vh) scroll box with a sticky header again, so it doesn't stretch to the height of the target table. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PavelMakarchuk requested a review from MaxGhenis June 12, 2026 05:18

PavelMakarchuk and others added 2 commits June 12, 2026 01:19

Drop build-machine next-env.d.ts churn

d2d4dfa

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PavelMakarchuk changed the title ~~Add Populace dashboard mode~~ Add Populace dashboard mode (calibration diagnostics) Jun 15, 2026

PavelMakarchuk changed the title ~~Add Populace dashboard mode (calibration diagnostics)~~ Add Populace dashboard mode (calibration diagnostics + incumbent comparison) Jun 15, 2026

PavelMakarchuk and others added 6 commits June 15, 2026 11:01

PavelMakarchuk changed the title ~~Add Populace dashboard mode (calibration diagnostics + incumbent comparison)~~ Populace dashboard: populace-US as the sole dataset, live from HF, with version compare Jun 16, 2026

PavelMakarchuk and others added 4 commits June 15, 2026 23:05

PavelMakarchuk marked this pull request as ready for review June 16, 2026 04:05

PavelMakarchuk merged commit 8bc3001 into main Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populace dashboard: populace-US as the sole dataset, live from HF, with version compare#15

Populace dashboard: populace-US as the sole dataset, live from HF, with version compare#15
PavelMakarchuk merged 14 commits into
mainfrom
populace-mode

PavelMakarchuk commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PavelMakarchuk commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it is now

API layer (the Next route handlers, all live-from-HF)

Removed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PavelMakarchuk commented Jun 12, 2026 •

edited

Loading