Skip to content

Release v0.4.0#107

Merged
slowdini merged 33 commits into
mainfrom
dev
Jun 21, 2026
Merged

Release v0.4.0#107
slowdini merged 33 commits into
mainfrom
dev

Conversation

@slowdini

Copy link
Copy Markdown
Owner

Release notes

Replace this paragraph with a short narrative for the release. If left unchanged, GitHub's auto-generated notes will be used instead.

slowdini and others added 30 commits June 18, 2026 20:43
Resolves issue #77 (design spike for the isolated-run epic #74): fixes the
one decision that blocks the rest of the epic and documents it at
docs/isolated-run.md.

Decision (option a): one isolated session runs the whole loop with a
switch-condition barrier between condition batches. This preserves
in-session transcript resolution via CLAUDE_CODE_SESSION_ID (the #79
invariant) and the singular env/ layout, while delivering real read
isolation by removing/swapping the staged skill so the control arm has
nothing to read. Rejects (b) separate sessions (reintroduces the
cross-session --subagents-dir dance) and (c) one shared env
(detect-stray-writes is blind to staged-copy reads inside env/).

The doc fixes the env/ vs iteration-N/ layout, the condition/dispatch
model under Claude's subagent cwd inheritance, where eval-magic meta vs
the clean env live, how benchmark.json is written above env/ (trusted
binary within guard allowedRoots), and the switch-condition barrier
contract. The "validate with one real Claude-interactive run" requirement
is captured as a checklist for #78/#79 to execute once env/ exists.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs(isolated-run): resolve design spike & add env/dispatch design note (#77)
feat(isolated-runs): runbook artifact
feat(isolated-runs): create isolated env for eval runs
Three mechanical changes that let the isolated session (cwd = iteration-N/env/)
drive the dispatch -> ingest -> finalize loop, per #79 (epic #74).

- command_target_args now threads an absolute --workspace-dir so ingest/finalize
  (and the upcoming switch-condition) resolve the iteration tree above the env
  instead of defaulting workspace_root to <cwd>/skills-workspace and bailing.
- Per-task dispatch outputs move into env/.eval-magic/outputs/eval-<id>/<cond>[/run-k]/,
  so the agent-under-test never writes above its cwd (docs/isolated-run.md §8);
  run.json/timing.json stay above the env in iteration-N/. record_runs is
  unchanged — it keys off task.outputs_dir.
- dispatch.json tasks[] are grouped by condition (all cond-A, then all cond-B) so
  the runbook's per-condition batch flow maps to a top-to-bottom walk of tasks[].

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements the per-condition read-isolation barrier and reframes the interactive
runbook to drive the whole loop from the isolated session, per #79 (epic #74).

- New `eval-magic switch-condition --condition <keep>` subcommand: removes the
  off-condition's staged skill from env/.claude/skills/ between dispatch batches so
  the next batch cannot read it. Surgical (only the one slug from conditions.json,
  not cleanup_staged_skills), idempotent, and leaves the sibling guard marker (and
  an armed guard) intact. Resolves the iteration from --workspace-dir so it runs
  from cwd=env/. Uniform across new-skill (remove with_skill) and revision (remove
  old_skill) since both arms stage at run time.
- Refactor the in-session dispatch guidance into shared per-condition fragments
  (insession_dispatch_batch / insession_switch_command / insession_ingest_command)
  used by both the runbook tokens and the post-run "Next:" message, so they cannot
  drift.
- Reframe profiles/claude-code/runbook.md into the batch loop: dispatch cond-A →
  join → switch-condition → dispatch cond-B → join → ingest → judges → finalize →
  teardown.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keep docs/isolated-run.md evergreen with #79:
- §2 Status: #79 has landed — the isolated session drives the whole loop, dispatch
  outputs live in env/.eval-magic/outputs/, and commands carry an absolute
  --workspace-dir to resolve from cwd=env/.
- §2 layout table: add the env/.eval-magic/outputs/ row.
- §4: switch-condition removes the off-condition slug (uniform across modes; both
  arms stage at run time), does not call cleanup_staged_skills, and does not re-arm
  the guard — superseding the earlier "in-place content swap" sketch.
- §6: the design assumptions are confirmed by the dogfood run.

Holistic README/--help pass for the isolated-run UX is deferred to #84.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the historical "used to materialize as a side effect" narrative and the
duplicated meta/env-split note; keep the evergreen rationale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
feat(isolated-runs): full-loop handoff for the isolated session (#79)
The env (incl. env/.claude/skills/) is now built before the isolated
session starts in it, so Claude Code's file-watcher discovery hazard is
gone structurally. Remove the workarounds it spawned:

- Drop staging_discovery_warning and staging_plugin_shadow_action (and the
  skills_dir_preexisted plumbing they were the only consumers of). The plain
  plugin-shadow banner stays — plugin shadowing is independent of staging.
- Replace the printed dispatch loop (insession_dispatch_next_steps) with a
  clean handoff (insession_isolated_handoff): the run summary now points the
  user to cd into env/, start a fresh session, and "Read and follow
  RUNBOOK.md". The full dispatch → switch-condition → ingest → finalize loop
  lives only in RUNBOOK.md now.
- Reconcile --no-stage: clarify it still runs inside the isolated env/ and
  only skips populating the harness skills dir (no behavior change).
- Reframe docs (docs/isolated-run.md, README Claude Code section, --help) to
  the isolated-env flow; delete the obsolete "Same-session staging gotcha".

Verified: cargo test (0 failures), fmt, clippy -D warnings, zero dead refs.

Closes #80

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-juggling

feat(isolated-runs): retire the session-juggling apparatus (#80)
…uard

feat(isolated-runs): focus guard flag for new isolated envs
feat(claude): headless run mode support
docs(isolated-runs): encapsulation bug fixes and docs update
Decide at run-setup which evals can share an environment and which need isolation, and write the batching plan into dispatch.json so the executing session does no isolation reasoning itself.

Grouping (src/cli/run/grouping.rs) is deterministic greedy first-fit: fixture conflicts auto-split into separate groups, an eval's new `isolation: isolated` hint forces its own singleton, and everything else shares one group. Realization is split by dispatch mechanism: in-session keeps one env/ and swaps groups via a new `reset-batch` barrier (full wipe + re-seed) with conditions still on switch-condition; Cli (hybrid/headless) materializes one env per (group, condition), which also closes a latent gap where the control arm's Cli env physically contained the staged skill.

Single-group in-session runs stay byte-identical to the pre-grouping shape (no groups/group/eval_root keys, bare env/, unchanged runbook). dispatch.json gains a groups[] summary plus per-task group/eval_root; the Cli recipes read a per-task eval_root. docs/isolated-run.md (§1-§4, §8), README, and --help updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
feat(run): setup-time isolation grouping for multi-run batches (#90)
…ow (#99)

The isolation-grouping feature (#90/#98) moved the Cli (hybrid/headless)
dispatch path from a single `env/` to one env per `(group, condition)`, but
left two best-effort mechanisms assuming the old single `env/`. Both are now
scoped to the Cli mechanism; the in-session single-`env/` path is unchanged.

- Plugin-shadow preflight: `post_build` scanned `ctx.stage_root` (`env/`), which
  Cli never creates, so project-local `.claude/settings.json` enabledPlugins
  were unseen. Hoist the existing `env_targets` and scan the first staged env
  instead (in-session's first target is `env/`, so byte-identical there).

- teardown/finalize guard checks were cwd-scoped. Add `staged_env_roots` (a
  directory scan over the iteration dir) and, under Cli, have `teardown` disarm
  each per-env guard before reclaim and `finalize` detect an armed env guard.
  The reminder now points at `eval-magic teardown` (cwd-only `teardown-guard`
  can't reach per-env markers).

Updates the `--guard`/`finalize` --help text and docs/isolated-run.md §4.

Tests: cli_plugin_shadow_preflight_reads_per_env_project_settings,
teardown_disarms_per_group_condition_cli_guards,
finalize_warns_about_armed_cli_per_env_guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fix(run): walk per-(group,condition) Cli envs for guard + plugin-shadow (#99)
The isolated-runs epic (#74) has shipped — env builder (#78), full-loop
handoff (#79), session-juggling retirement (#80), guard re-evaluation
(#81), isolation grouping (#90), and Cli multi-env follow-ups (#99) are
all closed. The doc was development guidance; its evergreen operating
contract already lives in README.md (the run loop, Isolation grouping,
the "Discovery is structural now" watcher note, and guard behavior).

- Remove the file's 8 references (README ×3, the switch-condition /
  reset-batch --help text in args.rs, the runbook.rs module note, and
  two build.rs comments) so deletion leaves no dangling pointers.
- File the doc's only untracked future-work item — filesystem-level
  isolation (mount namespaces / overlay / chroot) — as #102, linked to
  epic #74.
- Design rationale (spike alternatives, validation checklist) stays in
  the closed spikes #77/#90, git history, and code/tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@slowdini slowdini merged commit 5e5aba7 into main Jun 21, 2026
6 checks passed
slowdini added a commit that referenced this pull request Jun 21, 2026
Merge pull request #107 from slowdini/dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant