Skip to content

Feat/cesar watchdog pipeline#203

Merged
cukas merged 2 commits into
mainfrom
feat/cesar-watchdog-pipeline
Jun 11, 2026
Merged

Feat/cesar watchdog pipeline#203
cukas merged 2 commits into
mainfrom
feat/cesar-watchdog-pipeline

Conversation

@cukas

@cukas cukas commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

No description provided.

cukas and others added 2 commits June 11, 2026 16:34
4-layer pipeline (6-engine brainstorm-validated; two single-mechanism fixes
nero-verdicted FLAWED first): stripNonAssertionSpans removes quoted/fenced/
enumerated text so demonstrated text can't read as a claim; the mutation-stall
and fabricated-delegation detectors now require claim shapes (first-person/
result-first/past-done/passive-dispatch) instead of bare vocabulary — a
description OF the harness no longer fires them. shouldDeescalateGuard
converts residual matches on conversational turns into warn-only (never
suppression, fail-open on missing signals). Regression corpus: the
2026-06-11 kern-game-studio incident text, codex's result-first stall and
passive-dispatch lies, quoted-failure-text debug cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n-only)

Both guard call sites consult shouldDeescalateGuard (router intake+flow,
mutating-tool record): a match on a chat/answer or exploration turn with no
Write/Edit/Bash use warns visibly but injects NO corrective nudge — a misfire
can no longer derail an informational answer with 'apply it directly' or a
grounding turn. Task-shaped turns keep the full nudge path unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cukas cukas merged commit e8d557d into main Jun 11, 2026
2 checks passed
@cukas cukas deleted the feat/cesar-watchdog-pipeline branch June 11, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant