cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty#355
Merged
Merged
Conversation
… soften, surface difficulty The verifier is meant to set the agent's ceiling, but it did not bind the default training path and the curriculum's soften was dead on the worlds that matter. This makes the reward signal trustworthy before a GPU run, and makes the pool legible. Honest worlds (A): - A benign db request no longer serves a guarded (HIDDEN) value; only a vuln's own handler leaks it. verify.verdict now sweeps every reachable benign endpoint and rejects any that leaks, so a sibling endpoint can't make a world winnable without the intended exploit (closes the sibling-db key-lookup leak). - The consequence gate binds the default path: a task-less world FAILS instead of passing blind, seed-gate rejections are logged, and the seed gate is wired into the notebook's pool + held-out set (the evolve gate was already wired). - The chain-collapse soften is reachable again. Its relevance now comes from the agent's engagement with the public SSRF foothold -- the one signal that survives the in-process proxy -- floored above decoy-removal, so a chain-stuck agent eases the chain instead of a cosmetic decoy. First loop-reachability test for it. Legible pool (C): - The pool stamps each evolved world's solve-cost onto its lineage, and the dashboard reads + shows it (it was reading an unrelated, always-empty key). 5 new regression tests; full suite green (984 passed, 17 skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An adversarial audit (one reader + one skeptic per file) checked every comment,
docstring, and __all__ this PR added against .rules, whose default is no comments.
The verdict, applied here: the WHY behind each change is real, but the prose
narrated WHAT the code already says or referenced the fix ("used to leak", "now
enforces"). So each surviving comment is cut to the one non-obvious WHY a reader
couldn't derive — the hidden runtime contract that internal chain hops never reach
requests_made, that the db default shares a table with a vuln sibling, that lineage
sits outside the graph hash so re-stamping difficulty keeps the snapshot id stable —
and the rest is deleted. The _benign_sweep helper docstring becomes a one-line
inline comment (underscore helpers carry no docstring). Most test narration goes
outright; names and crafted args already carry it.
The skeptics also tested the "refactor so no comment is needed" path on each keeper
(split the shared table, extract helpers, assert the ordering in code) and rejected
it — every option adds indirection .rules forbids. No behavior changes; 77 impacted
tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A rigorous review of the cyber worldgen / evolution / curriculum / dashboard turned up four problems that make the training signal untrustworthy or the pool illegible. The verifier is supposed to set the agent's ceiling, but it didn't actually bind the default training path, and the curriculum's "soften" mutation was dead code on the exact worlds it's meant to help. This PR fixes them before any GPU run, so the reward we'd be training against is honest, and makes the evolving pool readable in the dashboard.
This is the A + thin C slice from that review (honesty fixes + difficulty made visible). No new capability — it makes the existing one trustworthy.
What
Honest worlds (A)
?key, including a guarded (HIDDEN) one — so a sibling endpoint could make a world "winnable" without the intended exploit. The handler now filters guarded values, andverify.verdictsweeps every reachable benign endpoint and rejects the world if any of them leaks. (Closes the sibling-db key-lookup leak.)requests_made). It now derives relevance from the agent's engagement with the public SSRF foothold — the one signal that survives the in-process proxy — floored above decoy-removal, so a chain-stuck agent actually eases the chain instead of cosmetically dropping a decoy. First loop-reachability test for it.Legible pool (C)
Testing
world_difficultyworld_difficultyto the dashboard🤖 Generated with Claude Code