cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty by larstalian · Pull Request #355 · vecna-labs/open-range

larstalian · 2026-06-25T22:19:17Z

Why

A rigorous review of the cyber worldgen / evolution / curriculum / dashboard turned up four problems that make the training signal untrustworthy or the pool illegible. The verifier is supposed to set the agent's ceiling, but it didn't actually bind the default training path, and the curriculum's "soften" mutation was dead code on the exact worlds it's meant to help. This PR fixes them before any GPU run, so the reward we'd be training against is honest, and makes the evolving pool readable in the dashboard.

This is the A + thin C slice from that review (honesty fixes + difficulty made visible). No new capability — it makes the existing one trustworthy.

What

Honest worlds (A)

A benign request can no longer leak the flag. The default db handler used to serve any value by ?key, including a guarded (HIDDEN) one — so a sibling endpoint could make a world "winnable" without the intended exploit. The handler now filters guarded values, and verify.verdict sweeps every reachable benign endpoint and rejects the world if any of them leaks. (Closes the sibling-db key-lookup leak.)
The consequence gate binds the default path. A task-less world now fails instead of passing blind; seed-gate rejections are logged; and the seed gate is wired into the notebook's training pool + held-out set (the evolve-time gate was already wired).
The chain-collapse soften is reachable again. On networked worlds its relevance was always 0 (it keyed on in-process paths that never appear in requests_made). It now derives relevance from the agent's engagement with the public SSRF foothold — the one signal that survives the in-process proxy — floored above decoy-removal, so a chain-stuck agent actually eases the chain instead of cosmetically dropping a decoy. First loop-reachability test for it.

Legible pool (C)

The pool now stamps each evolved world's solve-cost onto its lineage, and the dashboard reads + displays it. (It was reading an unrelated key that was always empty, so the panel never showed a difficulty.)

Testing

5 new regression tests:
- benign db endpoint never serves the flag
- whole-world verdict rejects a sibling leak
- evolved snapshot persists world_difficulty
- chain-soften is reachable from real agent reports
- lineage node surfaces world_difficulty to the dashboard
Full suite green: 984 passed, 17 skipped, 0 failed.

🤖 Generated with Claude Code

… soften, surface difficulty The verifier is meant to set the agent's ceiling, but it did not bind the default training path and the curriculum's soften was dead on the worlds that matter. This makes the reward signal trustworthy before a GPU run, and makes the pool legible. Honest worlds (A): - A benign db request no longer serves a guarded (HIDDEN) value; only a vuln's own handler leaks it. verify.verdict now sweeps every reachable benign endpoint and rejects any that leaks, so a sibling endpoint can't make a world winnable without the intended exploit (closes the sibling-db key-lookup leak). - The consequence gate binds the default path: a task-less world FAILS instead of passing blind, seed-gate rejections are logged, and the seed gate is wired into the notebook's pool + held-out set (the evolve gate was already wired). - The chain-collapse soften is reachable again. Its relevance now comes from the agent's engagement with the public SSRF foothold -- the one signal that survives the in-process proxy -- floored above decoy-removal, so a chain-stuck agent eases the chain instead of a cosmetic decoy. First loop-reachability test for it. Legible pool (C): - The pool stamps each evolved world's solve-cost onto its lineage, and the dashboard reads + shows it (it was reading an unrelated, always-empty key). 5 new regression tests; full suite green (984 passed, 17 skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

An adversarial audit (one reader + one skeptic per file) checked every comment, docstring, and __all__ this PR added against .rules, whose default is no comments. The verdict, applied here: the WHY behind each change is real, but the prose narrated WHAT the code already says or referenced the fix ("used to leak", "now enforces"). So each surviving comment is cut to the one non-obvious WHY a reader couldn't derive — the hidden runtime contract that internal chain hops never reach requests_made, that the db default shares a table with a vuln sibling, that lineage sits outside the graph hash so re-stamping difficulty keeps the snapshot id stable — and the rest is deleted. The _benign_sweep helper docstring becomes a one-line inline comment (underscore helpers carry no docstring). Most test narration goes outright; names and crafted args already carry it. The skeptics also tested the "refactor so no comment is needed" path on each keeper (split the shared table, extract helpers, assert the ordering in code) and rejected it — every option adds indirection .rules forbids. No behavior changes; 77 impacted tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

larstalian and others added 2 commits June 25, 2026 17:18

larstalian merged commit 0707fda into main Jun 25, 2026
2 checks passed

larstalian deleted the cyber/honest-signal branch June 25, 2026 22:50

github-actions Bot locked and limited conversation to collaborators Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty#355

cyber: bind the consequence verdict to the default path, un-stick the soften, surface difficulty#355
larstalian merged 2 commits into
mainfrom
cyber/honest-signal

larstalian commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

larstalian commented Jun 25, 2026

Why

What

Honest worlds (A)

Legible pool (C)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant