fix: preflight model reachability (#546) + enforce typed response_schema (#547)#548
Merged
Merged
Conversation
…chema (#547) #546 — Ensemble/blend runs silently degraded when a member model's slug was unreachable (a bad OpenRouter slug 404s on every call). Add a reachability pre-flight: - New synth_panel/preflight.py probes each distinct model in a multi-model run with a 1-token call and classifies the outcome (reachable / unreachable / inconclusive). BAD_REQUEST-class errors (404 / no endpoints) are fail-fast; transient/auth/credential failures are inconclusive and never block. - panel run probes before spending on BOTH real runs and --dry-run, and aborts naming the bad slug(s). New flags: --skip-preflight (bypass), --require-all-models (default behaviour, explicit), --min-models N (deliberately allow a degraded run). - --blend: detect members that produced zero usable responses and emit a loud top-level "BLEND DEGRADED" warning stating the surviving N, surfaced in the JSON envelope as blend_degraded + a warnings entry. #547 — A question's typed response_schema (enum/scale) was validated at load but never enforced or checked against output: - New synth_panel/response_coercion.py coerces a free-text answer to the nearest declared enum option (case-insensitive, punctuation/whitespace stripped, word-boundary substring, ambiguity-safe) or to an in-range integer for scales. - The orchestrator stores BOTH the raw text (response) and the typed value (response_typed), stamps the schema kind on each response, and flags unmappable answers with schema_unmapped + a per-response warning. - The saved result now persists per-question response_schema so poll-summary / analyze bucket enum/scale questions instead of falling back to kind=text. poll-summary prefers response_typed and treats an unmapped enum answer as unparseable rather than resurrecting off-schema free text. - --dry-run no longer implies enforcement: it states typed schemas are coerced post-hoc, not constrained at generation. Tests: response coercion unit (Blue.->blue, ambiguous/unmappable, scale range), preflight classification + CLI fail-fast/skip/dry-run/min-models, orchestrator end-to-end coercion + persistence, poll-summary enum bucketing, blend-drop detection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Wesley Johnson <wesley@dataviking.tech>
Deploying synthpanel with
|
| Latest commit: |
cb89814
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://159c0baa.synthpanel.pages.dev |
| Branch Preview URL: | https://fix-546-547-preflight-and-sc.synthpanel.pages.dev |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Wesley Johnson <wesley@dataviking.tech>
Contributor
Author
|
All 9 required status checks are green (test 3.10–3.14, coverage, lint, typecheck, security), plus the non-required jobs (install-smoke, site-cli-sync, CodeQL/Analyze, dependency-review, Cloudflare Pages). This PR is blocked only on a CODEOWNERS approval — the |
claude-dataviking
pushed a commit
that referenced
this pull request
Jun 4, 2026
Pre-bumps the version on this PR branch so auto-tag's "nothing to commit" path fires on merge (auto-tag cannot push the bump to main — the main ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest tag is still v1.5.6). Artifacts re-rendered via the three canonical scripts (render_site.py, render_site_markdown.py, render_server_card.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
the-data-viking
added a commit
that referenced
this pull request
Jun 5, 2026
…ments (#550) + release bump (#551) * fix(synthesis,fetch): route openrouter/* synthesis correctly (#549) + fail loudly on empty attachment fetch (#550) #549 — openrouter/anthropic/* synthesis routed to the Anthropic provider: The structured-output engine's final-strike escalation hard-coded the bare `sonnet` alias, which resolves to the *direct* Anthropic provider and demands ANTHROPIC_API_KEY. For an OpenRouter-only caller synthesizing with `openrouter/anthropic/*`, the first two strikes hit OpenRouter correctly but the escalation crossed providers and failed with "Missing API key for Anthropic". Escalation is now provider-aware (`_escalation_model_for`): an `openrouter/` model escalates to an OpenRouter-served Sonnet, staying on OPENROUTER_API_KEY. Also: a fallback synthesis (judge exhausted retries / partial schema) now carries an `is_fallback`/`error` marker in to_dict() and fails the run loudly (run_invalid + structured `synthesis_error`, exit 2) in both the in-run and `panel synthesize` paths, instead of being persisted as a silent near-empty success. #550 — attachment fetch failures degraded silently: A `type: url` attachment that failed to fetch (SSRF/loopback perimeter denial, HTTP error, timeout, empty extraction) was logged as a WARNING and replaced with a placeholder, so personas answered blind and the run reported 0% failure. `lower_url_blocks` now raises `AttachmentFetchError` by default — naming the URL and reason — which the orchestrator records as a failed response (counted in the failure rate and the per-question budget). Added `--allow-empty-attachments` to opt back into best-effort placeholder behaviour. Per-attachment fetch status (`ok`/`failed` + reason) is recorded on each response for auditability. Threaded the flag through run_panel_parallel / run_panel_sync. Documented the loopback/private-address SSRF block (local preview servers can't be URL sources; use inline html/document) in the attachments cookbook. Tests: provider-preserving escalation for openrouter/anthropic/*; end-to-end synthesis routing asserting no Anthropic-key demand; fallback marker + loud failure (in-run and re-synthesize); hard-error-by-default and opt-out for fetch failures; status_sink recording; orchestrator-level URL-attachment failure counting + status persistence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(release): bump to 1.5.7 + sync rendered version artifacts Pre-bumps the version on this PR branch so auto-tag's "nothing to commit" path fires on merge (auto-tag cannot push the bump to main — the main ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest tag is still v1.5.6). Artifacts re-rendered via the three canonical scripts (render_site.py, render_site_markdown.py, render_server_card.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Wesley Johnson <wesley@dataviking.tech> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #546 and #547.
#546 — Ensemble/blend runs silently degrade on an unreachable member slug
A bad OpenRouter slug (e.g.
openrouter/google/gemini-2.0-flash-001) 404s on every call; the run completed anyway and a--blendquietly dropped to fewer models.synth_panel/preflight.pyprobes each distinct model in a multi-model run with a 1-token call and classifies it:reachable/unreachable(404 / no-endpoints / model-not-found →BAD_REQUEST) /inconclusive(rate-limit / auth / transport / missing-credential — never blocks).panel runprobes before spending on both real runs and--dry-run, aborting with a message naming the bad slug(s). New flags:--skip-preflight,--require-all-models,--min-models N(deliberately allow a degraded run).--blend, members that produced zero usable responses trigger a loud top-levelBLEND DEGRADEDwarning stating the surviving N, also surfaced in the JSON envelope (blend_degraded+ awarningsentry).#547 —
response_schema(enum/scale) validated at load but never enforcedsynth_panel/response_coercion.pymaps a free-text answer to the nearest declared enum option (case-insensitive, punctuation/whitespace-stripped, word-boundary substring, ambiguity-safe) or to an in-range integer for scales. The"Blue."→"blue"repro is covered.response) and the typed value (response_typed), stamps the schema kind on each response, and flags unmappable answers withschema_unmapped+ a per-response warning.response_schema, sopoll-summary/analyzebucket enum/scale questions instead ofkind=text. poll-summary prefersresponse_typedand treats an unmapped enum answer as unparseable (no off-schema free text resurrected).--dry-runno longer implies enforcement — it states typed schemas are coerced post-hoc, not constrained at generation.Tests
New:
test_response_coercion.py,test_preflight.py,test_preflight_cli.py,test_response_schema_enforcement.py. Full suite: 3093 passed, coverage 87% (gate 80%). ruff + mypy clean.🤖 Generated with Claude Code