Prompt engineering is one good instruction. Loop Engineering is the system around it. loopforge tells you which of the six parts your "loop" is missing — then scaffolds and runs one that isn't.
loopforge is a linter, scaffolder, and runner for agent loops — the autonomous systems that wake up, do work, verify it, remember it, and decide whether to continue. The catch (per Google's Addy Osmani, who named the practice): a real loop is not a cron job that calls an agent. It has six moving parts, and most homemade loops are missing the pieces that stop budget burn, duplicated work, and unverified changes. loopforge makes those parts checkable. Deterministic, zero-dependency, no API key, no LLM call.
- You are about to let an agent run unattended and want brakes before budget burns.
- You want missing verify, memory, isolation, and handback caught in CI.
- You want a complete loop scaffold instead of another cron job with a prompt.
- You're building an autonomous loop (a CI-fixer, a content-curation bot, a nightly refactor).
Lint it first: loopforge tells you it has no brake, no memory, and reviews its own work — before
you let it run unattended. →
loopforge lint . - You want a correct loop to start from. Scaffold one with all six blocks already wired, so you
fill in the work instead of rediscovering the architecture. →
loopforge init my-loop - You want to run a loop safely. The runner enforces the limits it can actually measure
(iterations, wall-clock), records every step to a ledger, and refuses to even start a loop that
has no way to stop. →
loopforge run my-loop/loop.toml
A loop has to answer six questions. Miss one and it quietly stops being a loop:
| # | Block | The question it answers | loopforge checks |
|---|---|---|---|
| 1 | trigger | Who wakes it up? (schedule / event / until-goal) | not self-starting → L001 |
| 2 | isolation | Can parallel agents avoid clobbering each other? (worktrees) | parallel, no isolation → L006 |
| 3 | skills | How does it know your conventions? (durable project knowledge) | no skills → L007 |
| 4 | act | What does it actually do? (the agent invocation + its tools) | no command → L010 |
| 5 | verify | Who checks it — not itself? (a second command / different model) | missing or self-review → L004 / L005 |
| 6 | memory | How does it remember? (the model forgets; the repo doesn't) | no ledger → L003 |
…plus the two disciplines the article keeps hammering: a cost brake (L002/L008) so it can't
run away, and a human brake (L009) so judgment, acceptance, and the stop button stay with you.
pip install git+https://github.com/yingchen-coding/loopforge
loopforge init ci-green # scaffold a complete loop (passes lint out of the box)
loopforge lint ci-green --score # grade any loop A–F on the six blocks
loopforge run ci-green/loop.toml --dry-run # show exactly what it would doPoint it at someone else's loop, or a whole directory of them:
loopforge lint path/to/loops/ # finds every loop.toml / *.loop.toml and grades each📖 Hands-on tour: docs/walkthrough.md — scaffold a loop, break it to see each block flagged, wire a real verify step, run it safely, and gate it in CI.
A loop that looks fine — it's on a schedule and it calls an agent — but isn't:
$ loopforge lint nightly-cleanup/
nightly-cleanup
✖ critical L002 No hard stop — no iteration cap and no token/time/cost ceiling. A goal alone
(`until`) does not count: if the goal is never met, the loop runs forever.
↳ fix: Add trigger.max_iterations, or a [budget] with max_seconds / max_cost_usd.
✖ major L004 No verification step — the agent grades its own homework.
✖ major L003 No memory ledger — the model forgets between iterations.
✖ major L009 No handback — nothing ever returns control to a human.
✖ 7 findings — completeness: F (0/100)And the runner won't let you run that:
$ loopforge run nightly-cleanup/loop.toml
error: refusing to run nightly-cleanup: L002 — the loop has no brake and could run away.One loop.toml, six tables (loopforge init writes a complete one for you):
name = "ci-green"
goal = "Keep main green: when CI fails, find the cause, fix it, verify, and record the fix."
[trigger] # 1. who wakes it
type = "schedule"
cron = "*/30 * * * *"
max_iterations = 15 # ...and what makes it stop
[isolation] # 2. parallel-safe workspace
mode = "worktree"
[skills] # 3. durable project knowledge
files = ["skills/project.md"]
[act] # 4. the agent invocation (harness-agnostic)
command = "agent-cli run {prompt}"
prompt_file = "prompts/act.md"
[verify] # 5. an INDEPENDENT check — never the same command as act
command = "pytest -q"
[memory] # 6. the ledger the repo keeps
file = "memory/ledger.md"
trace_file = "memory/trace.jsonl" # full commands, outputs, verification, owner, and transitions
[budget] # cost brake
max_tokens = 200000
max_cost_usd = 5.0
[handback] # human brake
owner = "platform-oncall" # a concrete person/team owns acceptance and responsibility
on = ["budget-exceeded", "verify-failed-twice", "goal-reached", "needs-human"]
notify = "echo"{prompt} is assembled from your skills + the memory ledger + the prompt file, so every iteration
reloads what the project is and what's already been done. The act command is any agent CLI —
loopforge is harness-agnostic.
$ loopforge list-rules
| Code | Sev | What it catches |
|---|---|---|
L001 |
major | trigger missing or manual — a loop you start by hand isn't a loop |
L002 |
critical | no hard stop — no iteration cap or token/time/cost ceiling (a goal alone can loop forever) |
L011 |
minor/major | trigger declared but not wired — schedule with no cron, event with no source, typo'd type |
L003 |
major | no memory ledger — the model forgets every iteration |
L004 |
major | no verification — the agent grades its own homework |
L005 |
major | self-review — verify runs the same command/model as act |
L006 |
minor/major | no isolation (major if it runs in parallel) |
L007 |
minor | no skills/knowledge — re-onboards a new hire every iteration |
L008 |
major | iterations capped but no per-run token/cost ceiling |
L009 |
major | no handback — nothing returns control to a human |
L010 |
major | no act command — the loop does nothing |
L012 |
minor | no goal — can't tell progress from motion |
L013 |
major | referenced skills/memory/prompt file doesn't exist on disk |
L014 |
major | no named owner — the loop can finish, but nobody owns acceptance |
Only the runaway is critical, on purpose: a loop that can't stop is the one failure that turns
"unattended" into "expensive." Everything else degrades quality; that one burns money.
Loops are not free, and loopforge won't pretend otherwise — the comments under every Loop Engineering post are people who got a surprise bill. Two honest limits:
- Token cost is real. A loop re-reads context, retries, and re-verifies every iteration; several agents in parallel multiply it. loopforge enforces the brakes it can measure — iteration count and wall-clock — and surfaces your token/cost ceilings in the plan, but it can't count tokens it never sees. Don't loop a one-off task or one with no stable feedback signal.
- The loop moves work; it can't hold responsibility. "Done" from an agent isn't done, and
"tests pass" isn't "the logic is right." That's why
verifymust be independent andhandbackmust exist. The point of a loop is to pull you out of the repetitive parts — while judgment, acceptance, and the brake stay in your hands.
A ready-made GitHub Action ships in this repo (action.yml) — lint your loop definitions on every
PR so a loop can't regress into a runaway unnoticed:
name: loops
on: [push, pull_request]
jobs:
loopforge:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: yingchen-coding/loopforge@v0.5.3
with:
path: loops/ # dir of loop.toml files
fail-at: major
score: "true"Or as a plain step:
- run: pip install git+https://github.com/yingchen-coding/loopforge
- run: loopforge lint loops/ --scoreThe strongest verify isn't "the command exited 0" — it's "the prediction matched what actually
happened." loopforge eval scores a CSV of predictions once their real outcomes are known:
loopforge eval predictions.csv
# predictions: 4 · resolved: 3 · pending: 1
# accuracy: 67% (2/3 correct)
# calibration (Brier, lower=better): 0.228
# P&L: +3.20 · staked: 12.00 · ROI: +26.7%It's domain-agnostic — soccer bets, stock calls, anything with a predicted value and a later
actual (plus optional prob for calibration and stake/odds/result for P&L).
Auto-validate with the latest data. eval is the scoring half; close the loop by pairing it:
- act = a resolver that fetches the latest real outcomes and fills in
actual/result(a results API,yfinancefor stock calls, your chart for medical predictions — the domain's ground truth). - verify =
loopforge eval predictions.csv --min-accuracy 0.5— the loop fails its own gate when its predictions stop beating the bar. - schedule it, and every prediction you make validates itself against reality on a cadence.
A [trigger].cron is just metadata until something fires it. loopforge schedule installs it into
your crontab so the loop runs on its own:
loopforge schedule install my-loop/loop.toml --dry-run # preview the resulting crontab
loopforge schedule install my-loop/loop.toml # add it (Mondays 9am, etc.)
loopforge schedule list # show loopforge-managed entries
loopforge schedule remove my-loop # take it off the scheduleEntries are tagged # loopforge:<name> and managed idempotently — re-installing replaces, it never
duplicates. A no-clobber guard refuses to write if your current crontab can't be read, so your
other cron jobs are never wiped. Only type = "schedule" loops (with a cron) can be installed.
(macOS: cron needs Full Disk Access to run jobs that touch protected folders like ~/Documents.)
loopforge is the loop engine; the verify step is any command, so a real reviewer can be the
independent check. For agent definitions, pair it with agentguard
(a deterministic security linter for agents) — loopforge runs the loop, agentguard reviews the work:
[act]
command = "agent-cli run {prompt}" # the agent does the work
[verify]
command = "agentguard agents/ --fail-at critical" # an independent reviewer gates itPoint one loop at a whole fleet of definition sets (agentguard takes multiple paths), and you have a loop that keeps every agent you own clean — the loop engine driving the reviewer, on a schedule.
Loop Engineering is useful because the failure modes are predictable: runaway budget, duplicated work, missing memory, self-review, and no human owner. Predictable failure modes should be lintable, the same way a type checker catches the bugs you'd otherwise find in production. That's all loopforge is: the six blocks, made checkable, with a scaffolder and a safe runner attached.
pip install git+https://github.com/yingchen-coding/loopforge
# or for development:
git clone https://github.com/yingchen-coding/loopforge && cd loopforge && pip install -e ".[dev]"Python ≥ 3.11, zero runtime dependencies.
MIT © Ying Chen