Skip to content

cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty#355

Merged
larstalian merged 2 commits into
mainfrom
cyber/honest-signal
Jun 25, 2026
Merged

cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty#355
larstalian merged 2 commits into
mainfrom
cyber/honest-signal

Conversation

@larstalian

Copy link
Copy Markdown
Collaborator

Why

A rigorous review of the cyber worldgen / evolution / curriculum / dashboard turned up four problems that make the training signal untrustworthy or the pool illegible. The verifier is supposed to set the agent's ceiling, but it didn't actually bind the default training path, and the curriculum's "soften" mutation was dead code on the exact worlds it's meant to help. This PR fixes them before any GPU run, so the reward we'd be training against is honest, and makes the evolving pool readable in the dashboard.

This is the A + thin C slice from that review (honesty fixes + difficulty made visible). No new capability — it makes the existing one trustworthy.

What

Honest worlds (A)

  • A benign request can no longer leak the flag. The default db handler used to serve any value by ?key, including a guarded (HIDDEN) one — so a sibling endpoint could make a world "winnable" without the intended exploit. The handler now filters guarded values, and verify.verdict sweeps every reachable benign endpoint and rejects the world if any of them leaks. (Closes the sibling-db key-lookup leak.)
  • The consequence gate binds the default path. A task-less world now fails instead of passing blind; seed-gate rejections are logged; and the seed gate is wired into the notebook's training pool + held-out set (the evolve-time gate was already wired).
  • The chain-collapse soften is reachable again. On networked worlds its relevance was always 0 (it keyed on in-process paths that never appear in requests_made). It now derives relevance from the agent's engagement with the public SSRF foothold — the one signal that survives the in-process proxy — floored above decoy-removal, so a chain-stuck agent actually eases the chain instead of cosmetically dropping a decoy. First loop-reachability test for it.

Legible pool (C)

  • The pool now stamps each evolved world's solve-cost onto its lineage, and the dashboard reads + displays it. (It was reading an unrelated key that was always empty, so the panel never showed a difficulty.)

Testing

  • 5 new regression tests:
    • benign db endpoint never serves the flag
    • whole-world verdict rejects a sibling leak
    • evolved snapshot persists world_difficulty
    • chain-soften is reachable from real agent reports
    • lineage node surfaces world_difficulty to the dashboard
  • Full suite green: 984 passed, 17 skipped, 0 failed.

🤖 Generated with Claude Code

larstalian and others added 2 commits June 25, 2026 17:18
… soften, surface difficulty

The verifier is meant to set the agent's ceiling, but it did not bind the default
training path and the curriculum's soften was dead on the worlds that matter. This
makes the reward signal trustworthy before a GPU run, and makes the pool legible.

Honest worlds (A):
- A benign db request no longer serves a guarded (HIDDEN) value; only a vuln's own
  handler leaks it. verify.verdict now sweeps every reachable benign endpoint and
  rejects any that leaks, so a sibling endpoint can't make a world winnable without
  the intended exploit (closes the sibling-db key-lookup leak).
- The consequence gate binds the default path: a task-less world FAILS instead of
  passing blind, seed-gate rejections are logged, and the seed gate is wired into the
  notebook's pool + held-out set (the evolve gate was already wired).
- The chain-collapse soften is reachable again. Its relevance now comes from the
  agent's engagement with the public SSRF foothold -- the one signal that survives
  the in-process proxy -- floored above decoy-removal, so a chain-stuck agent eases
  the chain instead of a cosmetic decoy. First loop-reachability test for it.

Legible pool (C):
- The pool stamps each evolved world's solve-cost onto its lineage, and the dashboard
  reads + shows it (it was reading an unrelated, always-empty key).

5 new regression tests; full suite green (984 passed, 17 skipped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An adversarial audit (one reader + one skeptic per file) checked every comment,
docstring, and __all__ this PR added against .rules, whose default is no comments.

The verdict, applied here: the WHY behind each change is real, but the prose
narrated WHAT the code already says or referenced the fix ("used to leak", "now
enforces"). So each surviving comment is cut to the one non-obvious WHY a reader
couldn't derive — the hidden runtime contract that internal chain hops never reach
requests_made, that the db default shares a table with a vuln sibling, that lineage
sits outside the graph hash so re-stamping difficulty keeps the snapshot id stable —
and the rest is deleted. The _benign_sweep helper docstring becomes a one-line
inline comment (underscore helpers carry no docstring). Most test narration goes
outright; names and crafted args already carry it.

The skeptics also tested the "refactor so no comment is needed" path on each keeper
(split the shared table, extract helpers, assert the ordering in code) and rejected
it — every option adds indirection .rules forbids. No behavior changes; 77 impacted
tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@larstalian larstalian merged commit 0707fda into main Jun 25, 2026
2 checks passed
@larstalian larstalian deleted the cyber/honest-signal branch June 25, 2026 22:50
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant