cyber: make generated worlds honest; harden evolution + the training substrate#351
Merged
Conversation
Drop the comments that just restate what the code/identifier already says (BootEpisode's signature, the dial test's body, the boot helper's prose); keep the WHY ones (the boundary-injection and fresh-runtime rationale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…substrate Headline fix: under the PROCESS backing (the training/eval default), the internal cloud-metadata endpoint was a directly-servable route, leaking the flag with zero exploitation in 13/13 company worlds. Internal /svc/<name> endpoints are now dispatch-only -- reachable only via the in-process SSRF pivot, as CONTAINER's real per-service network enforces -- gated to networked worlds so flat worlds are untouched. Also in this pass (each verified live + tested; gated by two multi-agent audits): - difficulty: the 2-service-toy > 8-service-company inversion is gone (a capped internal-host fan-out term; dropped the cost-neutral POST body bump) - realism: recon no longer leaks the exact internal inventory (decoy-host chaff); per-world-unique decoy files; larger flag-key / host / service-name pools - curriculum: a RoundMetrics.difficulty_gain "is the frontier moving" signal; a consequence seed-gate so an unsolvable world can't seed the pool; soften can now shorten a chain to rescue a chain-stuck agent; diversify rotates the on-path technique within loot-compatible classes; fixed a harden-path render crash - training substrate: CONTAINER is now the floor for the training/eval measurement (file-read / sandboxed worlds escalate to the real container; in-band worlds keep the cheap PROCESS path; fails loud rather than silently 0-reward an emulation the agent can't exploit). New Pack.minimum_backing SDK hook, default PROCESS. Full suite green (959); ruff / mypy / boundary / coverage pass. Deferred company chain-deepening tracked in #343. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A discipline pass over the comments this branch added, per .rules (default no comments; add one only when the WHY is non-obvious). Several blocks were defending against a gotcha or restating a constraint in more than one place, so the real fix was in the code, not the prose: - diversify method-flip: guard it to sole-vuln endpoints so it can never disturb a co-located decoy, dropping the "this is safe because..." defense. The co-located case (db seed 11) stays solvable. - pool seed: extract _gated_members so WorldPool.seed and EvalPool.seed share one loop, removing a comment that was duplicated at both sites (and that the SeedGate alias already documents). - difficulty_gain: explain the metric once, on the public RoundMetrics field, not also (with a typo) inside _grow. - loot-shape constraint: one source of truth (_ORACLE_SHAPES_FOR_LOOT), not restated across three comments. - correct a false claim: a curriculum-added vuln draws from the same pools as a sampled one; it is not "byte-identical". - drop history/version references and an underscore-helper docstring. Behavior is unchanged except the method-flip guard. ruff, mypy, and the impacted tests are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Second discipline pass after the diff still read as comment-heavy. Ran a root-fix-or-delete sweep over every comment in the changed files (not just the new ones), taking the strongest move per comment: - root fix: removed a dead ``ONTOLOGY_ID`` keep-alive import in mutation.py; renamed difficulty's ``internal - chain_hops - 1`` to a self-documenting ``walk_and_target_hosts`` so the comment is unnecessary. - delete: dropped section headers, pool-convention WHAT comments, restated step labels, and four docstrings that just restated name + signature. - trim: shortened the rest to the one non-obvious WHY, dropping task/version refs and clauses already carried by names or a single source of truth. ~76 inline-comment lines removed (sampling.py 142 -> 94). Behavior is unchanged apart from the inert rename and dead-import removal. ruff, mypy, and the impacted tests (330) are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7a68840 to
e50e58d
Compare
Integrate the commits main gained since this branch forked: #341 (dashboard), #342 (the comment trim that is also this branch's first commit), #345/#348 (custom reward_fn reaches the pool + training-path robustness), and #350 (the self-verifying LLM-Builder seam). Two conflicts, both resolved toward main's newer code: - openrange-trl _report_scalar: keep main's reward_fn(report) (#345), with this branch's trimmed comment. - cyber llm_realize: #350 replaced the inline realize_world loop with the _HandlerAuthor / realize_verified seam; took main's refactor wholesale (this branch's only edit there was a now-moot comment deletion). ruff, mypy, boundary, and 311 impacted tests are green on the merged tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
This started as "make the cyber gym's generated worlds honest, and make sure world evolution actually makes sense" — and, through a couple of multi-agent audits that each caught real problems, grew into one coherent pass over worldgen, the curriculum, and the training substrate.
Every change keeps worlds solvable-by-construction, deterministic, and PROCESS↔CONTAINER parity-safe. Full suite green (955 passed);
ruff/mypy/ pack-core boundary / coverage all pass.🔴 The headline: an honesty bug that was corrupting training reward
Under the PROCESS backing (what training and eval default to), the internal cloud-metadata endpoint was a directly-servable HTTP route —
GET /svc/<host>/latest/meta-data/credentialhanded back the flag with zero exploitation, in 13 of 13 company worlds. An agent could score reward1.0without ever performing the SSRF pivot the world exists to teach.Fix: internal
/svc/<name>endpoints are now a dispatch-only table — reachable only by the in-process SSRF pivot, never a direct request (a direct GET now 404s) — exactly the network segmentation CONTAINER gets for free from real per-service hosts. Gated to networked worlds, so flat worlds (where internal vulns are solved by direct request, by design) are untouched.The rest of the pass
Difficulty metric. A 2-service toy SSRF world was rated ~40% harder than a full 8-service company breach. Added a capped internal-host fan-out term so a wide estate registers (company difficulty went from 5 → 15 distinct values) and removed the cost-neutral POST "body bump." The inversion is gone.
Realism. The recon page disclosed the exact internal inventory with zero decoys (a perfect oracle) — now padded with believable decoy hostnames the agent has to triage. Decoy files are per-world-unique (were 5 byte-identical literals). Grew the most-memorizable pools (flag key, SSRF targets, service names).
Curriculum / evolution.
RoundMetrics.difficulty_gain— the honest "is the frontier still moving" read, distinct fromfrontier_capped(a run can keep admitting children while only creeping on cosmetic decoys).Training substrate. The structural answer to "PROCESS feels toyish": CONTAINER is now the floor for the training/eval measurement. A file-read or sandboxed world escalates to the real container (where reward is genuine, not silently 0); in-band worlds keep the cheap ~50 ms PROCESS path; and it fails loud if that escalation needs Docker that's absent — it never silently trains on an emulation the agent can't exploit. New
Pack.minimum_backingSDK hook (default PROCESS, so the SWE/trading packs inherit it untouched).Process note
Two multi-agent audits gated this work and earned their cost. The plan audit caught three flaws in my own proposals before I built them (a
max()over an unordered enum that picked the wrong backing; a chain conversion that would have stranded the flag; a threshold that would have put domain knowledge in core). The implementation/gap audits caught a real regression in already-"done" work (a docker-gated test left red by the recon change) and the missing tests fixed here (the untested fail-loud safety path).Deferred
Turning a recon→pivot company world into a deeper credential chain is the one large, higher-risk item left — tracked with its full implementation recipe in #343.
🤖 Generated with Claude Code