Motivation
This is a building-Tilth tool, not a user-facing feature — it speeds the Demo-run protocol per phase (proposals/v1-implementation-plan.md) where we re-run the demo over and over while iterating on the harness itself. The docs should present it as contributor/dev tooling, not part of the getting-started flow.
Today the only reset is a full teardown (tilth reset → workspace.reset_session_state: git worktree remove --force + git branch -D + rm -rf sessions/<id>/), so each test cycle forces a fresh tilth prep-feature — an interactive interview (TTYFrontend.ask_user), the slowest and least-automatable part of the loop.
When we're not testing the seed workflow — only the worker/evaluator/ledger/case mechanics of a phase — the seed is reusable. We want to re-run tilth run against the same committed seed without re-interviewing.
Proposal
tilth reset --to-seed [<session_id>] — a flag on the existing reset subcommand. It rewinds time to the instant prep-feature finished: the seed is intact, and every trace of post-seed activity is gone from both the code and the logs, as if the run never happened.
This is deliberately destructive and lossy — that's the feature. We are not preserving the prior run for comparison (if you want that, run separate sessions). The point is a pristine slate so the next run's logs and worktree carry zero noise from the last attempt. A [y/N] confirm is the safety gate. Hard-delete, no archive, no tip tag — intentional.
Anchor: the seed_committed event records the seed commit (payload.sha + payload.branch) — a clean HEAD on session/<id> before any task work. That sha is the rewind target.
Init sanity check (all must pass before the destructive confirm)
Run these as a pre-flight; on any failure, no-op with a clean error pointing at full tilth reset + tilth prep-feature:
- Session dir exists under
sessions/<id>/.
- A
seed_committed event exists → recover seed_sha + branch. A session that never finished prep has no rewind target → clean error.
- A
session_prepared event exists and carries tokens_used → recover the post-interview token total (see checkpoint step below). (seed_committed implies session_prepared, but read it explicitly rather than assume.)
- The worktree exists on disk —
checkpoint.workspace is set and the directory is present. The whole op runs git reset/git clean inside the worktree; if it's gone there's nowhere to rewind → clean error (parallel to the existing "session has no worktree recorded" message in do_resume_cmd).
prd.json passes the shape check (below).
Shape check (specifics)
There is no seed/prd version stamp today, so cross-phase incompatibility cannot be detected by version. The shape check is therefore a best-effort structural sanity check that the committed prd.json still matches the contract the runner expects, reusing the seed-writer's rules (tilth/seed/sink.py: REQUIRED_PRD_KEYS, TASK_ID_RE). Concretely, prd.json must:
- parse as a non-empty JSON list;
- have every entry be a dict containing all
REQUIRED_PRD_KEYS (id, title, description, acceptance_criteria);
- have every
id match T-NNN (TASK_ID_RE, 3+ digits) and be unique;
- have every
acceptance_criteria be a non-empty list.
This mirrors sink._validate minus its test_files coupling (the tests are already committed in the seed, so they aren't re-derived here). It catches gross shape drift — a seed written under an incompatible seeder — not subtle semantic drift. Future tightening: if we later add a SEED_VERSION stamp, this check should also assert version compatibility and reject seeds written under an older phase's seeder. Until then, --to-seed is for iterating within a phase; a structurally-valid but semantically-stale seed is the user's responsibility.
Rewind actions
- Worktree + branch:
git reset --hard <seed_sha> + git clean -fd in the worktree. Drops all post-seed commits — per-task commits and any FAILED (...) placeholder commit (loop.py failure path) — and untracked task files. The orphaned commits are GC'd; no tip tag is kept (no trace, by design). Use -fd, not -fdx: gitignored cruft (__pycache__/, .pytest_cache/) is intentionally left in place — it's harmless to a fresh run, and -x would risk nuking a worktree-local .venv/.env. (Documented caveat: a fresh run is byte-equivalent to post-prep modulo this gitignored cruft.)
prd.json: reset every task status back to pending.
checkpoint.json: status → prepared; tokens_used → the post-interview total from the session_prepared event. This is a "make it never happened" tool, so the run's entire token spend is erased — tokens_used returns to exactly what it was when the seed completed. (There is no last-completed-task field to clear — next-pending is derived from prd.json statuses, so resetting those is the mechanism.)
events.jsonl: truncate to keep every line up to and including the seed_committed event, drop everything after. This preserves the full prep trail — session_start[phase=prep-feature], the interview's intermediate events (model_call, tool_call, memory_load, prompt_assembled, …), session_prepared, and seed_committed — and erases the run. Lossy and intentional; no archive.
- Delete outright:
ledger/, progress.txt, proposed-learnings.md, summary.json, chat.html. All regenerated on the next run. seed-meta.json is preserved — it's part of the seed.
Postcondition
sessions/<id>/ is equivalent to its state the moment prep-feature returned (modulo timestamps and gitignored worktree cruft). The session's checkpoint status is prepared, so tilth run <workspace> picks it up via the prepared-session path (_find_prepared_sessions, source-matched) and starts as a clean first attempt — no re-interview.
Caveats to encode
- Destructive, irreversible. The confirm prompt must say so plainly — the prior run's code and logs are unrecoverable. Contrast with full
tilth reset (also destructive, but obviously so); --to-seed looks gentler, so the warning matters more.
- Phase-boundary compatibility. Per the plan's "no backwards-compat across phase boundaries," a seed written under one phase's seeder may not be re-runnable under a later phase if the
prd.json/seed shape changed. The shape check above catches structural drift; semantic drift is out of detection range until a SEED_VERSION stamp exists.
- Lossy by design. No archive of the rewound run. This is the intended trade-off for a dev tool whose whole purpose is a pristine next-run slate — accepted explicitly (it diverges from Tilth's general "every run is inspectable" stance, which is why it's a building-Tilth tool, not a user feature).
Out of scope
Related
proposals/v1-implementation-plan.md — Demo-run protocol per phase (this directly speeds that loop)
workspace.reset_session_state, loop._do_reset, ws.commit_task/commit_seed, the seed_committed / session_prepared / commit events
- Implementation footprint:
--to-seed flag on reset_p (cli.py); a do_reset_to_seed sibling to _do_reset (loop.py) with distinct confirm copy; a ws.rewind_to_seed(worktree, seed_sha) in workspace.py; small helpers to truncate events.jsonl at the boundary and reset prd statuses. Legacy single-dash flag surface is not extended (new feature, subcommand-only).
Motivation
This is a building-Tilth tool, not a user-facing feature — it speeds the Demo-run protocol per phase (
proposals/v1-implementation-plan.md) where we re-run the demo over and over while iterating on the harness itself. The docs should present it as contributor/dev tooling, not part of the getting-started flow.Today the only reset is a full teardown (
tilth reset→workspace.reset_session_state:git worktree remove --force+git branch -D+rm -rf sessions/<id>/), so each test cycle forces a freshtilth prep-feature— an interactive interview (TTYFrontend.ask_user), the slowest and least-automatable part of the loop.When we're not testing the seed workflow — only the worker/evaluator/ledger/case mechanics of a phase — the seed is reusable. We want to re-run
tilth runagainst the same committed seed without re-interviewing.Proposal
tilth reset --to-seed [<session_id>]— a flag on the existingresetsubcommand. It rewinds time to the instantprep-featurefinished: the seed is intact, and every trace of post-seed activity is gone from both the code and the logs, as if the run never happened.This is deliberately destructive and lossy — that's the feature. We are not preserving the prior run for comparison (if you want that, run separate sessions). The point is a pristine slate so the next run's logs and worktree carry zero noise from the last attempt. A
[y/N]confirm is the safety gate. Hard-delete, no archive, no tip tag — intentional.Anchor: the
seed_committedevent records the seed commit (payload.sha+payload.branch) — a clean HEAD onsession/<id>before any task work. That sha is the rewind target.Init sanity check (all must pass before the destructive confirm)
Run these as a pre-flight; on any failure, no-op with a clean error pointing at full
tilth reset+tilth prep-feature:sessions/<id>/.seed_committedevent exists → recoverseed_sha+branch. A session that never finished prep has no rewind target → clean error.session_preparedevent exists and carriestokens_used→ recover the post-interview token total (see checkpoint step below). (seed_committedimpliessession_prepared, but read it explicitly rather than assume.)checkpoint.workspaceis set and the directory is present. The whole op runsgit reset/git cleaninside the worktree; if it's gone there's nowhere to rewind → clean error (parallel to the existing "session has no worktree recorded" message indo_resume_cmd).prd.jsonpasses the shape check (below).Shape check (specifics)
There is no seed/prd version stamp today, so cross-phase incompatibility cannot be detected by version. The shape check is therefore a best-effort structural sanity check that the committed
prd.jsonstill matches the contract the runner expects, reusing the seed-writer's rules (tilth/seed/sink.py:REQUIRED_PRD_KEYS,TASK_ID_RE). Concretely,prd.jsonmust:REQUIRED_PRD_KEYS(id,title,description,acceptance_criteria);idmatchT-NNN(TASK_ID_RE, 3+ digits) and be unique;acceptance_criteriabe a non-empty list.This mirrors
sink._validateminus itstest_filescoupling (the tests are already committed in the seed, so they aren't re-derived here). It catches gross shape drift — a seed written under an incompatible seeder — not subtle semantic drift. Future tightening: if we later add aSEED_VERSIONstamp, this check should also assert version compatibility and reject seeds written under an older phase's seeder. Until then,--to-seedis for iterating within a phase; a structurally-valid but semantically-stale seed is the user's responsibility.Rewind actions
git reset --hard <seed_sha>+git clean -fdin the worktree. Drops all post-seed commits — per-taskcommits and anyFAILED (...)placeholder commit (loop.pyfailure path) — and untracked task files. The orphaned commits are GC'd; no tip tag is kept (no trace, by design). Use-fd, not-fdx: gitignored cruft (__pycache__/,.pytest_cache/) is intentionally left in place — it's harmless to a fresh run, and-xwould risk nuking a worktree-local.venv/.env. (Documented caveat: a fresh run is byte-equivalent to post-prep modulo this gitignored cruft.)prd.json: reset every taskstatusback topending.checkpoint.json:status→prepared;tokens_used→ the post-interview total from thesession_preparedevent. This is a "make it never happened" tool, so the run's entire token spend is erased —tokens_usedreturns to exactly what it was when the seed completed. (There is no last-completed-task field to clear — next-pending is derived fromprd.jsonstatuses, so resetting those is the mechanism.)events.jsonl: truncate to keep every line up to and including theseed_committedevent, drop everything after. This preserves the full prep trail —session_start[phase=prep-feature], the interview's intermediate events (model_call,tool_call,memory_load,prompt_assembled, …),session_prepared, andseed_committed— and erases the run. Lossy and intentional; no archive.ledger/,progress.txt,proposed-learnings.md,summary.json,chat.html. All regenerated on the next run.seed-meta.jsonis preserved — it's part of the seed.Postcondition
sessions/<id>/is equivalent to its state the momentprep-featurereturned (modulo timestamps and gitignored worktree cruft). The session's checkpointstatusisprepared, sotilth run <workspace>picks it up via the prepared-session path (_find_prepared_sessions, source-matched) and starts as a clean first attempt — no re-interview.Caveats to encode
tilth reset(also destructive, but obviously so);--to-seedlooks gentler, so the warning matters more.prd.json/seed shape changed. The shape check above catches structural drift; semantic drift is out of detection range until aSEED_VERSIONstamp exists.Out of scope
pyproject.tomlpollution (separate; Research: Docker Sandboxes (sbx) as opt-in process isolation #13).Related
proposals/v1-implementation-plan.md— Demo-run protocol per phase (this directly speeds that loop)workspace.reset_session_state,loop._do_reset,ws.commit_task/commit_seed, theseed_committed/session_prepared/commitevents--to-seedflag onreset_p(cli.py); ado_reset_to_seedsibling to_do_reset(loop.py) with distinct confirm copy; aws.rewind_to_seed(worktree, seed_sha)inworkspace.py; small helpers to truncateevents.jsonlat the boundary and reset prd statuses. Legacy single-dash flag surface is not extended (new feature, subcommand-only).