cyber: make generated worlds honest; harden evolution + the training substrate by larstalian · Pull Request #351 · vecna-labs/open-range

larstalian · 2026-06-24T05:42:13Z

What & why

This started as "make the cyber gym's generated worlds honest, and make sure world evolution actually makes sense" — and, through a couple of multi-agent audits that each caught real problems, grew into one coherent pass over worldgen, the curriculum, and the training substrate.

Every change keeps worlds solvable-by-construction, deterministic, and PROCESS↔CONTAINER parity-safe. Full suite green (955 passed); ruff / mypy / pack-core boundary / coverage all pass.

🔴 The headline: an honesty bug that was corrupting training reward

Under the PROCESS backing (what training and eval default to), the internal cloud-metadata endpoint was a directly-servable HTTP route — GET /svc/<host>/latest/meta-data/credential handed back the flag with zero exploitation, in 13 of 13 company worlds. An agent could score reward 1.0 without ever performing the SSRF pivot the world exists to teach.

Fix: internal /svc/<name> endpoints are now a dispatch-only table — reachable only by the in-process SSRF pivot, never a direct request (a direct GET now 404s) — exactly the network segmentation CONTAINER gets for free from real per-service hosts. Gated to networked worlds, so flat worlds (where internal vulns are solved by direct request, by design) are untouched.

The rest of the pass

Difficulty metric. A 2-service toy SSRF world was rated ~40% harder than a full 8-service company breach. Added a capped internal-host fan-out term so a wide estate registers (company difficulty went from 5 → 15 distinct values) and removed the cost-neutral POST "body bump." The inversion is gone.

Realism. The recon page disclosed the exact internal inventory with zero decoys (a perfect oracle) — now padded with believable decoy hostnames the agent has to triage. Decoy files are per-world-unique (were 5 byte-identical literals). Grew the most-memorizable pools (flag key, SSRF targets, service names).

Curriculum / evolution.

RoundMetrics.difficulty_gain — the honest "is the frontier still moving" read, distinct from frontier_capped (a run can keep admitting children while only creeping on cosmetic decoys).
A consequence seed-gate so a structurally-admitted-but-unsolvable world can't silently seed the pool.
Soften can now shorten a chain — collapse the last credential hop to rescue an agent stuck on the chain (decoy-removal couldn't); validated through the real re-admit path (parity-safe, deterministic, difficulty strictly drops).
Diversify now rotates the on-path technique — swaps the flag-reading oracle's class within loot-compatible shapes (was: only off-path decoys), without stranding the flag.
Fixed a harden-path crash (a curriculum-introduced vuln rendered on undefined params for 6 of 9 classes).

Training substrate. The structural answer to "PROCESS feels toyish": CONTAINER is now the floor for the training/eval measurement. A file-read or sandboxed world escalates to the real container (where reward is genuine, not silently 0); in-band worlds keep the cheap ~50 ms PROCESS path; and it fails loud if that escalation needs Docker that's absent — it never silently trains on an emulation the agent can't exploit. New Pack.minimum_backing SDK hook (default PROCESS, so the SWE/trading packs inherit it untouched).

Process note

Two multi-agent audits gated this work and earned their cost. The plan audit caught three flaws in my own proposals before I built them (a max() over an unordered enum that picked the wrong backing; a chain conversion that would have stranded the flag; a threshold that would have put domain knowledge in core). The implementation/gap audits caught a real regression in already-"done" work (a docker-gated test left red by the recon change) and the missing tests fixed here (the untested fail-loud safety path).

Deferred

Turning a recon→pivot company world into a deeper credential chain is the one large, higher-risk item left — tracked with its full implementation recipe in #343.

🤖 Generated with Claude Code

Drop the comments that just restate what the code/identifier already says (BootEpisode's signature, the dial test's body, the boot helper's prose); keep the WHY ones (the boundary-injection and fresh-runtime rationale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…substrate Headline fix: under the PROCESS backing (the training/eval default), the internal cloud-metadata endpoint was a directly-servable route, leaking the flag with zero exploitation in 13/13 company worlds. Internal /svc/<name> endpoints are now dispatch-only -- reachable only via the in-process SSRF pivot, as CONTAINER's real per-service network enforces -- gated to networked worlds so flat worlds are untouched. Also in this pass (each verified live + tested; gated by two multi-agent audits): - difficulty: the 2-service-toy > 8-service-company inversion is gone (a capped internal-host fan-out term; dropped the cost-neutral POST body bump) - realism: recon no longer leaks the exact internal inventory (decoy-host chaff); per-world-unique decoy files; larger flag-key / host / service-name pools - curriculum: a RoundMetrics.difficulty_gain "is the frontier moving" signal; a consequence seed-gate so an unsolvable world can't seed the pool; soften can now shorten a chain to rescue a chain-stuck agent; diversify rotates the on-path technique within loot-compatible classes; fixed a harden-path render crash - training substrate: CONTAINER is now the floor for the training/eval measurement (file-read / sandboxed worlds escalate to the real container; in-band worlds keep the cheap PROCESS path; fails loud rather than silently 0-reward an emulation the agent can't exploit). New Pack.minimum_backing SDK hook, default PROCESS. Full suite green (959); ruff / mypy / boundary / coverage pass. Deferred company chain-deepening tracked in #343. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A discipline pass over the comments this branch added, per .rules (default no comments; add one only when the WHY is non-obvious). Several blocks were defending against a gotcha or restating a constraint in more than one place, so the real fix was in the code, not the prose: - diversify method-flip: guard it to sole-vuln endpoints so it can never disturb a co-located decoy, dropping the "this is safe because..." defense. The co-located case (db seed 11) stays solvable. - pool seed: extract _gated_members so WorldPool.seed and EvalPool.seed share one loop, removing a comment that was duplicated at both sites (and that the SeedGate alias already documents). - difficulty_gain: explain the metric once, on the public RoundMetrics field, not also (with a typo) inside _grow. - loot-shape constraint: one source of truth (_ORACLE_SHAPES_FOR_LOOT), not restated across three comments. - correct a false claim: a curriculum-added vuln draws from the same pools as a sampled one; it is not "byte-identical". - drop history/version references and an underscore-helper docstring. Behavior is unchanged except the method-flip guard. ruff, mypy, and the impacted tests are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Second discipline pass after the diff still read as comment-heavy. Ran a root-fix-or-delete sweep over every comment in the changed files (not just the new ones), taking the strongest move per comment: - root fix: removed a dead ``ONTOLOGY_ID`` keep-alive import in mutation.py; renamed difficulty's ``internal - chain_hops - 1`` to a self-documenting ``walk_and_target_hosts`` so the comment is unnecessary. - delete: dropped section headers, pool-convention WHAT comments, restated step labels, and four docstrings that just restated name + signature. - trim: shortened the rest to the one non-obvious WHY, dropping task/version refs and clauses already carried by names or a single source of truth. ~76 inline-comment lines removed (sampling.py 142 -> 94). Behavior is unchanged apart from the inert rename and dead-import removal. ruff, mypy, and the impacted tests (330) are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Integrate the commits main gained since this branch forked: #341 (dashboard), #342 (the comment trim that is also this branch's first commit), #345/#348 (custom reward_fn reaches the pool + training-path robustness), and #350 (the self-verifying LLM-Builder seam). Two conflicts, both resolved toward main's newer code: - openrange-trl _report_scalar: keep main's reward_fn(report) (#345), with this branch's trimmed comment. - cyber llm_realize: #350 replaced the inline realize_world loop with the _HandlerAuthor / realize_verified seam; took main's refactor wholesale (this branch's only edit there was a now-moot comment deletion). ruff, mypy, boundary, and 311 impacted tests are green on the merged tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

larstalian and others added 3 commits June 22, 2026 21:08

larstalian closed this Jun 24, 2026

larstalian reopened this Jun 24, 2026

github-actions Bot locked and limited conversation to collaborators Jun 24, 2026

larstalian force-pushed the cyber/honest-worldgen-evolution branch from 7a68840 to e50e58d Compare June 24, 2026 17:15

larstalian and others added 2 commits June 24, 2026 12:24

Merge branch 'main' into cyber/honest-worldgen-evolution

20f4b6f

larstalian marked this pull request as ready for review June 25, 2026 04:53

larstalian merged commit 4ce54b2 into main Jun 25, 2026
2 checks passed

larstalian deleted the cyber/honest-worldgen-evolution branch June 25, 2026 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cyber: make generated worlds honest; harden evolution + the training substrate#351

cyber: make generated worlds honest; harden evolution + the training substrate#351
larstalian merged 6 commits into
mainfrom
cyber/honest-worldgen-evolution

larstalian commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

larstalian commented Jun 24, 2026

What & why

🔴 The headline: an honesty bug that was corrupting training reward

The rest of the pass

Process note

Deferred

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant