This is a hands-on tour. In ~10 minutes you'll scaffold a loop, see loopforge grade it, break it on purpose to watch each of the six blocks get flagged, wire in a real verify step, run it safely, and gate it in CI. Every command here is real — run them as you read.
pip install git+https://github.com/yingchen-coding/loopforge
loopforge --versionZero dependencies, Python ≥ 3.11.
loopforge init ci-greenYou get a directory with all six blocks already wired:
ci-green/
loop.toml # the six blocks + a cost brake + a human brake
skills/project.md # durable project knowledge
memory/ledger.md # what the repo remembers across iterations
prompts/act.md # the per-iteration instruction
Confirm it's complete:
loopforge lint ci-green --score
# ✓ complete — all six blocks present
# Loop completeness: A (100/100)That's the baseline: a loop that answers all six questions.
Open ci-green/loop.toml. Each table is one block:
| Block | Question | Why a loop dies without it |
|---|---|---|
[trigger] |
Who wakes it up? | A loop you start by hand is a manual run, not a loop. |
[isolation] |
Parallel-safe workspace? | Two agents editing the same file clobber each other. |
[skills] |
How does it know your conventions? | Every iteration restarts as a new hire who doesn't know the codebase. |
[act] |
What does it do? | No command, no work. |
[verify] |
Who checks it — not itself? | The author is the worst reviewer of their own work; in a loop the error compounds. |
[memory] |
How does it remember? | The model forgets; without a ledger it re-litigates settled questions. |
Plus two disciplines the loop dies expensively without:
[budget]— a cost brake, so it can't burn tokens forever.[handback]— a human brake, so judgment, acceptance, and the stop button stay with you.
Delete the [verify] table from loop.toml and re-lint:
loopforge lint ci-green
# ✖ major L004 No verification step — the agent grades its own homework.Now make verify.command identical to act.command (e.g. both claude -p {prompt}):
# ✖ major L005 Self-review — the verify step runs the exact same command as act.Delete every limit — remove trigger.max_iterations and the whole [budget] table:
# ✖ critical L002 No hard stop — a goal alone can loop forever; it can burn budget with no cap.L002 is the only critical rule, on purpose: a loop that can't stop is the one failure that
turns "unattended" into "expensive." Note a goal (until) does not satisfy it — only a hard cap
(iterations / time / cost) guarantees the loop stops. Run loopforge list-rules for the full catalog.
The scaffold uses pytest -q as the independent check. Point act at your agent and verify at
whatever actually proves the work — a test suite, a build, a typecheck, or a different model
reviewing the diff:
[act]
command = "claude -p {prompt}"
prompt_file = "prompts/act.md"
[verify]
command = "pytest -q" # or: reviewer_command = "claude --model <other> -p {review}"
require = "exit-zero"The rule of thumb the article borrows from code review: the writer shouldn't review themselves.
See exactly what it would do, without executing:
loopforge run ci-green/loop.toml --dry-runWhen you run for real, loopforge:
- Refuses to start if the loop has a
criticalfinding (no brake). - Builds each iteration's prompt from
skills+ thememoryledger +prompts/act.md. - Runs
act, then the independentverify. - If
isolation = "worktree"and you're in a git repo, runs act/verify in an isolated git worktree (its path is printed) so the original tree is untouched until you review and merge. - Appends a row to
memory/ledger.md— the repo remembers even after the process exits. The ledger is written to the original repo, not the worktree, so the record survives. - Stops on: goal reached, verify failed twice, budget/iteration cap, or the agent saying
NEEDS-HUMAN, and hands back to you.
Honesty note: loopforge enforces the limits it can measure — iteration count and wall-clock time. It passes your
max_tokens/max_cost_usdthrough to your agent and shows them in the plan, but it does not pretend to count tokens it never sees.
Keep a loop from regressing into a runaway by linting it on every PR:
# .github/workflows/loops.yml
name: loops
on: [push, pull_request]
jobs:
loopforge:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: yingchen-coding/loopforge@v0.5.0
with:
path: loops/
fail-at: major
score: "true"loopforge list-rules— the full rule catalog.examples/ci-green/— the complete reference loop.examples/half-baked/— what most "loops" actually look like (and what loopforge says about them).- The six blocks come from Addy Osmani's Loop Engineering writeup; loopforge just makes them checkable.