Skip to content

populace.calibrate.diagnostics: ship the calibration evidence with the artifact#14

Closed
PavelMakarchuk wants to merge 1 commit into
mainfrom
calibration-diagnostics
Closed

populace.calibrate.diagnostics: ship the calibration evidence with the artifact#14
PavelMakarchuk wants to merge 1 commit into
mainfrom
calibration-diagnostics

Conversation

@PavelMakarchuk

Copy link
Copy Markdown
Contributor

Fixes #10.

What

Adds populace.calibrate.diagnostics:

  • diagnostics_payload(result) renders a CalibrationResult as a JSON-stable dict carrying the full evidence, not summaries: every per-target row (target, initial_estimate, final_estimate, relative_error, within_tolerance), the whole per-epoch loss_trajectory, every skipped target with its reason, and the solver options actually used (including the realized matrix_format and the l0_lambda the budget search settled on).
  • write_calibration_diagnostics(result, path) writes it as calibration_diagnostics.json — the artifact a release directory publishes next to its manifests. Strict JSON: non-finite floats become null in the scrub, and the writer passes allow_nan=False so anything that escapes fails loudly instead of smuggling out NaN tokens.

Why

CalibrationResult already computes all of this, but none of it leaves the build machine: build_dataset.py pushes diagnostics to telemetry and drops them, and the published .npz keeps only closing scalars. Two charter promises are only true if the report ships: "skipped and reported, never dropped silently", and "artifacts carry their environment". Downstream, the calibration-diagnostics dashboard's Populace mode can't show convergence or skip reasons until this exists.

The provenance snapshots under packages/*/build are deliberately untouched (they're verbatim audit copies); the next build script calls write_calibration_diagnostics(result, release_dir / "calibration_diagnostics.json") after calibrate(...), and publish_release(..., extra_files=("calibration_diagnostics.json",)) from #9's branch uploads it.

Testing

  • 4 new behavioral tests on real calibrate() runs: full-evidence payload (trajectory length == epochs, tolerance verdicts, options echo), skipped-target reasons ship and never leak into compiled rows, strict-JSON round-trip, file writer round-trip.
  • Full workspace suite on this branch: 360 passed, 4 skipped. ruff check clean. uv build --wheel packages/populace-calibrate succeeds.

🤖 Generated with Claude Code

…he artifact

CalibrationResult already carries everything an auditor needs — per-target
initial/final estimates and tolerance verdicts, the per-epoch loss
trajectory, every skipped target with its reason, and the solver options
actually used — but none of it left the build machine: the build pushed
diagnostics to telemetry and dropped them, and the published .npz kept
only closing scalars. "Skipped and reported, never dropped silently" is
only true if the report ships.

diagnostics_payload() renders the result as strict JSON (non-finite
floats become null; the writer passes allow_nan=False so anything that
escapes the scrub fails loudly), and write_calibration_diagnostics()
writes the calibration_diagnostics.json a release directory publishes
next to its manifests.

Fixes #10.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MaxGhenis

Copy link
Copy Markdown
Contributor

Superseded by #37, which merged calibration diagnostics into the release contract and package exports.

@MaxGhenis MaxGhenis closed this Jun 14, 2026
@MaxGhenis MaxGhenis deleted the calibration-diagnostics branch June 14, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish calibration diagnostics (per-target rows, loss trajectory, skipped targets) with each populace-us release

2 participants