diff --git a/.gitignore b/.gitignore index 0592392..24b97b6 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,6 @@ /target .DS_Store + +# eval-magic run artifacts (workspace root + per-env outputs) — churn every run +.eval-magic/ +.eval-magic-outputs/ diff --git a/README.md b/README.md index 4522da0..4d8ec07 100644 --- a/README.md +++ b/README.md @@ -86,7 +86,7 @@ environment. ```bash # 1. Build the iteration's isolated env (arm --guard — see Cost & confirmation). -# run stages skills into skills-workspace/my-skill/iteration-1/env/, copies +# run stages skills into .eval-magic/my-skill/iteration-1/env/, copies # fixtures in, and writes RUNBOOK.md. It does NOT dispatch — it prints a handoff. # Add --runs to dispatch every eval N times per condition for variance # reduction (a per-eval "runs" field in evals.json overrides the flag). @@ -112,7 +112,7 @@ eval-magic ingest # armed, finalize reminds you to run teardown-guard before editing source. eval-magic finalize -# 5. Read skills-workspace/my-skill/iteration-1/benchmark.json (the prep session +# 5. Read .eval-magic/my-skill/iteration-1/benchmark.json (the prep session # resumes here), then clean up: eval-magic teardown ``` @@ -201,7 +201,7 @@ Read `validity_warnings` **before** trusting any delta — a low skill-invocatio Per skill being evaluated, the runner produces this tree (everything but `evals/evals.json` is generated): ``` -skills-workspace// # outside the skill directory, gitignore it +.eval-magic// # outside the skill directory, gitignore it snapshots/ # Mode B baselines, persist across iterations