Skip to content

fix: preflight model reachability (#546) + enforce typed response_schema (#547)#548

Merged
the-data-viking merged 2 commits into
mainfrom
fix/546-547-preflight-and-schema-coercion
Jun 4, 2026
Merged

fix: preflight model reachability (#546) + enforce typed response_schema (#547)#548
the-data-viking merged 2 commits into
mainfrom
fix/546-547-preflight-and-schema-coercion

Conversation

@claude-dataviking

Copy link
Copy Markdown
Contributor

Fixes #546 and #547.

#546 — Ensemble/blend runs silently degrade on an unreachable member slug

A bad OpenRouter slug (e.g. openrouter/google/gemini-2.0-flash-001) 404s on every call; the run completed anyway and a --blend quietly dropped to fewer models.

  • synth_panel/preflight.py probes each distinct model in a multi-model run with a 1-token call and classifies it: reachable / unreachable (404 / no-endpoints / model-not-found → BAD_REQUEST) / inconclusive (rate-limit / auth / transport / missing-credential — never blocks).
  • panel run probes before spending on both real runs and --dry-run, aborting with a message naming the bad slug(s). New flags: --skip-preflight, --require-all-models, --min-models N (deliberately allow a degraded run).
  • For --blend, members that produced zero usable responses trigger a loud top-level BLEND DEGRADED warning stating the surviving N, also surfaced in the JSON envelope (blend_degraded + a warnings entry).

#547response_schema (enum/scale) validated at load but never enforced

  • synth_panel/response_coercion.py maps a free-text answer to the nearest declared enum option (case-insensitive, punctuation/whitespace-stripped, word-boundary substring, ambiguity-safe) or to an in-range integer for scales. The "Blue.""blue" repro is covered.
  • The orchestrator stores both the raw text (response) and the typed value (response_typed), stamps the schema kind on each response, and flags unmappable answers with schema_unmapped + a per-response warning.
  • The saved result now persists per-question response_schema, so poll-summary/analyze bucket enum/scale questions instead of kind=text. poll-summary prefers response_typed and treats an unmapped enum answer as unparseable (no off-schema free text resurrected).
  • --dry-run no longer implies enforcement — it states typed schemas are coerced post-hoc, not constrained at generation.

Tests

New: test_response_coercion.py, test_preflight.py, test_preflight_cli.py, test_response_schema_enforcement.py. Full suite: 3093 passed, coverage 87% (gate 80%). ruff + mypy clean.

🤖 Generated with Claude Code

…chema (#547)

#546 — Ensemble/blend runs silently degraded when a member model's slug was
unreachable (a bad OpenRouter slug 404s on every call). Add a reachability
pre-flight:
- New synth_panel/preflight.py probes each distinct model in a multi-model
  run with a 1-token call and classifies the outcome (reachable /
  unreachable / inconclusive). BAD_REQUEST-class errors (404 / no endpoints)
  are fail-fast; transient/auth/credential failures are inconclusive and
  never block.
- panel run probes before spending on BOTH real runs and --dry-run, and
  aborts naming the bad slug(s). New flags: --skip-preflight (bypass),
  --require-all-models (default behaviour, explicit), --min-models N
  (deliberately allow a degraded run).
- --blend: detect members that produced zero usable responses and emit a
  loud top-level "BLEND DEGRADED" warning stating the surviving N, surfaced
  in the JSON envelope as blend_degraded + a warnings entry.

#547 — A question's typed response_schema (enum/scale) was validated at load
but never enforced or checked against output:
- New synth_panel/response_coercion.py coerces a free-text answer to the
  nearest declared enum option (case-insensitive, punctuation/whitespace
  stripped, word-boundary substring, ambiguity-safe) or to an in-range
  integer for scales.
- The orchestrator stores BOTH the raw text (response) and the typed value
  (response_typed), stamps the schema kind on each response, and flags
  unmappable answers with schema_unmapped + a per-response warning.
- The saved result now persists per-question response_schema so poll-summary
  / analyze bucket enum/scale questions instead of falling back to
  kind=text. poll-summary prefers response_typed and treats an unmapped
  enum answer as unparseable rather than resurrecting off-schema free text.
- --dry-run no longer implies enforcement: it states typed schemas are
  coerced post-hoc, not constrained at generation.

Tests: response coercion unit (Blue.->blue, ambiguous/unmappable, scale
range), preflight classification + CLI fail-fast/skip/dry-run/min-models,
orchestrator end-to-end coercion + persistence, poll-summary enum bucketing,
blend-drop detection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Wesley Johnson <wesley@dataviking.tech>
@claude-dataviking claude-dataviking added the semver:patch Bump patch version on merge label Jun 4, 2026
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 4, 2026

Copy link
Copy Markdown

Deploying synthpanel with  Cloudflare Pages  Cloudflare Pages

Latest commit: cb89814
Status: ✅  Deploy successful!
Preview URL: https://159c0baa.synthpanel.pages.dev
Branch Preview URL: https://fix-546-547-preflight-and-sc.synthpanel.pages.dev

View logs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Wesley Johnson <wesley@dataviking.tech>
@claude-dataviking

Copy link
Copy Markdown
Contributor Author

All 9 required status checks are green (test 3.10–3.14, coverage, lint, typecheck, security), plus the non-required jobs (install-smoke, site-cli-sync, CodeQL/Analyze, dependency-review, Cloudflare Pages). semver:patch is applied for the auto-tag → publish flow to cut v1.5.7 on merge.

This PR is blocked only on a CODEOWNERS approval — the main ruleset requires one approving review from a CODEOWNER (@the-data-viking / @openclaw-dv) with require_last_push_approval, and the ruleset has no bypass actors. As the PR author I can't self-approve. Ready to merge (squash) once a CODEOWNER approves.

@the-data-viking the-data-viking merged commit 51e1af0 into main Jun 4, 2026
19 checks passed
@the-data-viking the-data-viking deleted the fix/546-547-preflight-and-schema-coercion branch June 4, 2026 20:48
claude-dataviking pushed a commit that referenced this pull request Jun 4, 2026
Pre-bumps the version on this PR branch so auto-tag's "nothing to commit"
path fires on merge (auto-tag cannot push the bump to main — the main
ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as
v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest
tag is still v1.5.6).

Artifacts re-rendered via the three canonical scripts (render_site.py,
render_site_markdown.py, render_server_card.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
the-data-viking added a commit that referenced this pull request Jun 5, 2026
…ments (#550) + release bump (#551)

* fix(synthesis,fetch): route openrouter/* synthesis correctly (#549) + fail loudly on empty attachment fetch (#550)

#549 — openrouter/anthropic/* synthesis routed to the Anthropic provider:
The structured-output engine's final-strike escalation hard-coded the bare
`sonnet` alias, which resolves to the *direct* Anthropic provider and demands
ANTHROPIC_API_KEY. For an OpenRouter-only caller synthesizing with
`openrouter/anthropic/*`, the first two strikes hit OpenRouter correctly but
the escalation crossed providers and failed with "Missing API key for
Anthropic". Escalation is now provider-aware (`_escalation_model_for`): an
`openrouter/` model escalates to an OpenRouter-served Sonnet, staying on
OPENROUTER_API_KEY. Also: a fallback synthesis (judge exhausted retries /
partial schema) now carries an `is_fallback`/`error` marker in to_dict() and
fails the run loudly (run_invalid + structured `synthesis_error`, exit 2) in
both the in-run and `panel synthesize` paths, instead of being persisted as a
silent near-empty success.

#550 — attachment fetch failures degraded silently:
A `type: url` attachment that failed to fetch (SSRF/loopback perimeter denial,
HTTP error, timeout, empty extraction) was logged as a WARNING and replaced
with a placeholder, so personas answered blind and the run reported 0% failure.
`lower_url_blocks` now raises `AttachmentFetchError` by default — naming the URL
and reason — which the orchestrator records as a failed response (counted in the
failure rate and the per-question budget). Added `--allow-empty-attachments` to
opt back into best-effort placeholder behaviour. Per-attachment fetch status
(`ok`/`failed` + reason) is recorded on each response for auditability. Threaded
the flag through run_panel_parallel / run_panel_sync. Documented the
loopback/private-address SSRF block (local preview servers can't be URL sources;
use inline html/document) in the attachments cookbook.

Tests: provider-preserving escalation for openrouter/anthropic/*; end-to-end
synthesis routing asserting no Anthropic-key demand; fallback marker + loud
failure (in-run and re-synthesize); hard-error-by-default and opt-out for
fetch failures; status_sink recording; orchestrator-level URL-attachment
failure counting + status persistence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(release): bump to 1.5.7 + sync rendered version artifacts

Pre-bumps the version on this PR branch so auto-tag's "nothing to commit"
path fires on merge (auto-tag cannot push the bump to main — the main
ruleset rejects the Actions bot with GH013). Ships the #549/#550 fixes as
v1.5.7 (the v1.5.7 slot — #548's intended release never tagged, latest
tag is still v1.5.6).

Artifacts re-rendered via the three canonical scripts (render_site.py,
render_site_markdown.py, render_server_card.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wesley Johnson <wesley@dataviking.tech>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver:patch Bump patch version on merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensemble run silently degrades when a member model's slug is unreachable (no fail-fast)

2 participants