fix: preflight model reachability (#546) + enforce typed response_schema (#547) by claude-dataviking · Pull Request #548 · DataViking-Tech/SynthPanel

claude-dataviking · 2026-06-04T20:37:42Z

Fixes #546 and #547.

#546 — Ensemble/blend runs silently degrade on an unreachable member slug

A bad OpenRouter slug (e.g. openrouter/google/gemini-2.0-flash-001) 404s on every call; the run completed anyway and a --blend quietly dropped to fewer models.

synth_panel/preflight.py probes each distinct model in a multi-model run with a 1-token call and classifies it: reachable / unreachable (404 / no-endpoints / model-not-found → BAD_REQUEST) / inconclusive (rate-limit / auth / transport / missing-credential — never blocks).
panel run probes before spending on both real runs and --dry-run, aborting with a message naming the bad slug(s). New flags: --skip-preflight, --require-all-models, --min-models N (deliberately allow a degraded run).
For --blend, members that produced zero usable responses trigger a loud top-level BLEND DEGRADED warning stating the surviving N, also surfaced in the JSON envelope (blend_degraded + a warnings entry).

#547 — `response_schema` (enum/scale) validated at load but never enforced

synth_panel/response_coercion.py maps a free-text answer to the nearest declared enum option (case-insensitive, punctuation/whitespace-stripped, word-boundary substring, ambiguity-safe) or to an in-range integer for scales. The "Blue." → "blue" repro is covered.
The orchestrator stores both the raw text (response) and the typed value (response_typed), stamps the schema kind on each response, and flags unmappable answers with schema_unmapped + a per-response warning.
The saved result now persists per-question response_schema, so poll-summary/analyze bucket enum/scale questions instead of kind=text. poll-summary prefers response_typed and treats an unmapped enum answer as unparseable (no off-schema free text resurrected).
--dry-run no longer implies enforcement — it states typed schemas are coerced post-hoc, not constrained at generation.

Tests

New: test_response_coercion.py, test_preflight.py, test_preflight_cli.py, test_response_schema_enforcement.py. Full suite: 3093 passed, coverage 87% (gate 80%). ruff + mypy clean.

🤖 Generated with Claude Code

…chema (#547) #546 — Ensemble/blend runs silently degraded when a member model's slug was unreachable (a bad OpenRouter slug 404s on every call). Add a reachability pre-flight: - New synth_panel/preflight.py probes each distinct model in a multi-model run with a 1-token call and classifies the outcome (reachable / unreachable / inconclusive). BAD_REQUEST-class errors (404 / no endpoints) are fail-fast; transient/auth/credential failures are inconclusive and never block. - panel run probes before spending on BOTH real runs and --dry-run, and aborts naming the bad slug(s). New flags: --skip-preflight (bypass), --require-all-models (default behaviour, explicit), --min-models N (deliberately allow a degraded run). - --blend: detect members that produced zero usable responses and emit a loud top-level "BLEND DEGRADED" warning stating the surviving N, surfaced in the JSON envelope as blend_degraded + a warnings entry. #547 — A question's typed response_schema (enum/scale) was validated at load but never enforced or checked against output: - New synth_panel/response_coercion.py coerces a free-text answer to the nearest declared enum option (case-insensitive, punctuation/whitespace stripped, word-boundary substring, ambiguity-safe) or to an in-range integer for scales. - The orchestrator stores BOTH the raw text (response) and the typed value (response_typed), stamps the schema kind on each response, and flags unmappable answers with schema_unmapped + a per-response warning. - The saved result now persists per-question response_schema so poll-summary / analyze bucket enum/scale questions instead of falling back to kind=text. poll-summary prefers response_typed and treats an unmapped enum answer as unparseable rather than resurrecting off-schema free text. - --dry-run no longer implies enforcement: it states typed schemas are coerced post-hoc, not constrained at generation. Tests: response coercion unit (Blue.->blue, ambiguous/unmappable, scale range), preflight classification + CLI fail-fast/skip/dry-run/min-models, orchestrator end-to-end coercion + persistence, poll-summary enum bucketing, blend-drop detection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Wesley Johnson <wesley@dataviking.tech>

cloudflare-workers-and-pages · 2026-06-04T20:37:45Z

Deploying synthpanel with Cloudflare Pages

Latest commit:	`cb89814`
Status:	✅ Deploy successful!
Preview URL:	https://159c0baa.synthpanel.pages.dev
Branch Preview URL:	https://fix-546-547-preflight-and-sc.synthpanel.pages.dev

View logs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Wesley Johnson <wesley@dataviking.tech>

claude-dataviking · 2026-06-04T20:41:16Z

All 9 required status checks are green (test 3.10–3.14, coverage, lint, typecheck, security), plus the non-required jobs (install-smoke, site-cli-sync, CodeQL/Analyze, dependency-review, Cloudflare Pages). semver:patch is applied for the auto-tag → publish flow to cut v1.5.7 on merge.

This PR is blocked only on a CODEOWNERS approval — the main ruleset requires one approving review from a CODEOWNER (@the-data-viking / @openclaw-dv) with require_last_push_approval, and the ruleset has no bypass actors. As the PR author I can't self-approve. Ready to merge (squash) once a CODEOWNER approves.

Pre-bumps the version on this PR branch so auto-tag's "nothing to commit" path fires on merge (auto-tag cannot push the bump to main — the main ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest tag is still v1.5.6). Artifacts re-rendered via the three canonical scripts (render_site.py, render_site_markdown.py, render_server_card.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ments (#550) + release bump (#551) * fix(synthesis,fetch): route openrouter/* synthesis correctly (#549) + fail loudly on empty attachment fetch (#550) #549 — openrouter/anthropic/* synthesis routed to the Anthropic provider: The structured-output engine's final-strike escalation hard-coded the bare `sonnet` alias, which resolves to the *direct* Anthropic provider and demands ANTHROPIC_API_KEY. For an OpenRouter-only caller synthesizing with `openrouter/anthropic/*`, the first two strikes hit OpenRouter correctly but the escalation crossed providers and failed with "Missing API key for Anthropic". Escalation is now provider-aware (`_escalation_model_for`): an `openrouter/` model escalates to an OpenRouter-served Sonnet, staying on OPENROUTER_API_KEY. Also: a fallback synthesis (judge exhausted retries / partial schema) now carries an `is_fallback`/`error` marker in to_dict() and fails the run loudly (run_invalid + structured `synthesis_error`, exit 2) in both the in-run and `panel synthesize` paths, instead of being persisted as a silent near-empty success. #550 — attachment fetch failures degraded silently: A `type: url` attachment that failed to fetch (SSRF/loopback perimeter denial, HTTP error, timeout, empty extraction) was logged as a WARNING and replaced with a placeholder, so personas answered blind and the run reported 0% failure. `lower_url_blocks` now raises `AttachmentFetchError` by default — naming the URL and reason — which the orchestrator records as a failed response (counted in the failure rate and the per-question budget). Added `--allow-empty-attachments` to opt back into best-effort placeholder behaviour. Per-attachment fetch status (`ok`/`failed` + reason) is recorded on each response for auditability. Threaded the flag through run_panel_parallel / run_panel_sync. Documented the loopback/private-address SSRF block (local preview servers can't be URL sources; use inline html/document) in the attachments cookbook. Tests: provider-preserving escalation for openrouter/anthropic/*; end-to-end synthesis routing asserting no Anthropic-key demand; fallback marker + loud failure (in-run and re-synthesize); hard-error-by-default and opt-out for fetch failures; status_sink recording; orchestrator-level URL-attachment failure counting + status persistence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(release): bump to 1.5.7 + sync rendered version artifacts Pre-bumps the version on this PR branch so auto-tag's "nothing to commit" path fires on merge (auto-tag cannot push the bump to main — the main ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest tag is still v1.5.6). Artifacts re-rendered via the three canonical scripts (render_site.py, render_site_markdown.py, render_server_card.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Wesley Johnson <wesley@dataviking.tech> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude-dataviking added the semver:patch Bump patch version on merge label Jun 4, 2026

claude-dataviking requested review from openclaw-dv and the-data-viking as code owners June 4, 2026 20:37

style: apply ruff format to new preflight code

cb89814

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Wesley Johnson <wesley@dataviking.tech>

the-data-viking merged commit 51e1af0 into main Jun 4, 2026
19 checks passed

the-data-viking deleted the fix/546-547-preflight-and-schema-coercion branch June 4, 2026 20:48

claude-dataviking mentioned this pull request Jun 4, 2026

v1.5.7: synthesis provider routing (#549) + fail-loud on empty attachments (#550) + release bump #551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preflight model reachability (#546) + enforce typed response_schema (#547)#548

fix: preflight model reachability (#546) + enforce typed response_schema (#547)#548
the-data-viking merged 2 commits into
mainfrom
fix/546-547-preflight-and-schema-coercion

claude-dataviking commented Jun 4, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

claude-dataviking commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

claude-dataviking commented Jun 4, 2026

#546 — Ensemble/blend runs silently degrade on an unreachable member slug

#547 — response_schema (enum/scale) validated at load but never enforced

Tests

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying synthpanel with Cloudflare Pages

Uh oh!

claude-dataviking commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#547 — `response_schema` (enum/scale) validated at load but never enforced

cloudflare-workers-and-pages Bot commented Jun 4, 2026 •

edited

Loading