populace.calibrate.diagnostics: ship the calibration evidence with the artifact#14
Closed
PavelMakarchuk wants to merge 1 commit into
Closed
populace.calibrate.diagnostics: ship the calibration evidence with the artifact#14PavelMakarchuk wants to merge 1 commit into
PavelMakarchuk wants to merge 1 commit into
Conversation
…he artifact CalibrationResult already carries everything an auditor needs — per-target initial/final estimates and tolerance verdicts, the per-epoch loss trajectory, every skipped target with its reason, and the solver options actually used — but none of it left the build machine: the build pushed diagnostics to telemetry and dropped them, and the published .npz kept only closing scalars. "Skipped and reported, never dropped silently" is only true if the report ships. diagnostics_payload() renders the result as strict JSON (non-finite floats become null; the writer passes allow_nan=False so anything that escapes the scrub fails loudly), and write_calibration_diagnostics() writes the calibration_diagnostics.json a release directory publishes next to its manifests. Fixes #10. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
Superseded by #37, which merged calibration diagnostics into the release contract and package exports. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10.
What
Adds
populace.calibrate.diagnostics:diagnostics_payload(result)renders aCalibrationResultas a JSON-stable dict carrying the full evidence, not summaries: every per-target row (target,initial_estimate,final_estimate,relative_error,within_tolerance), the whole per-epochloss_trajectory, every skipped target with its reason, and the solveroptionsactually used (including the realizedmatrix_formatand thel0_lambdathe budget search settled on).write_calibration_diagnostics(result, path)writes it ascalibration_diagnostics.json— the artifact a release directory publishes next to its manifests. Strict JSON: non-finite floats becomenullin the scrub, and the writer passesallow_nan=Falseso anything that escapes fails loudly instead of smuggling outNaNtokens.Why
CalibrationResultalready computes all of this, but none of it leaves the build machine:build_dataset.pypushes diagnostics to telemetry and drops them, and the published.npzkeeps only closing scalars. Two charter promises are only true if the report ships: "skipped and reported, never dropped silently", and "artifacts carry their environment". Downstream, the calibration-diagnostics dashboard's Populace mode can't show convergence or skip reasons until this exists.The provenance snapshots under
packages/*/buildare deliberately untouched (they're verbatim audit copies); the next build script callswrite_calibration_diagnostics(result, release_dir / "calibration_diagnostics.json")aftercalibrate(...), andpublish_release(..., extra_files=("calibration_diagnostics.json",))from #9's branch uploads it.Testing
calibrate()runs: full-evidence payload (trajectory length == epochs, tolerance verdicts, options echo), skipped-target reasons ship and never leak into compiled rows, strict-JSON round-trip, file writer round-trip.ruff checkclean.uv build --wheel packages/populace-calibratesucceeds.🤖 Generated with Claude Code