Agent Kaizen is a practical reference implementation of my Kaizen System for AI coding-agent work in VS Code projects. Repo: https://github.com/LevyBytes/agent-kaizen.
AI agents are great at basic coding and endless work, but often drift, offer half remembered decisions or hallucinate when questioned. My Agent Kaizen project offers a managed loop:
SAVMI = Scope -> Adapt -> Verify -> Manage -> Improve
With my framework and harness, you can accomplish a tremendous amount of useful work as just an individual working with ai agents. More quality and a lot less slop.
This repo is an active foundation: it is usable now, but always evolving, and intentionally built as a reference harness that can be adapted into other projects. This work is independent and is not affiliated with or endorsed by OpenAI, Anthropic, Microsoft, GitHub, VS Code, Turso, or any other vendor. Although if my work helped you, I do accept donations for tacos and tea if you're feeling generous.
- New to the idea: read this intro, then
Kaizen_System.md. - Installing it: download and run the one-file installer for your platform from the repo's
setup/folder (see Setup below). - Using this repo with an agent: have your agent read
setup/SETUP.md, then Daily Workflow. - Extending the harness: read
support_scripts/README.md. - Adapting the system elsewhere: use Adopting Agent Kaizen In A Project as the starting point.
A memorable mnemonic because every good and bad idea has one:
SAVMI = Scope -> Adapt -> Verify -> Manage -> Improve
| Layer | Job | Typical outputs |
|---|---|---|
| Scope | Understand intent and evidence | Iterative Spec, assumptions, acceptance criteria |
| Adapt | Change the system through bounds | Execution contracts, patches, scripts |
| Verify | Decide if the work can proceed | Go/no-go result, proof, findings |
| Manage | Preserve and govern work data | DB records, hashes, reports, policy context |
| Improve | Decide what to improve next | Retrospective, next-cycle priorities |
The master concept document is Kaizen_System.md. This README explains the repo that implements it.
| Surface | Role |
|---|---|
Kaizen_System.md |
Portable method for humans and agents |
kaizen.py |
Deterministic write path for managed records |
AI/db/ |
Local data plane: DB, exports, manifests, backups |
evals/ |
Command stubs plus portable eval and learning surfaces |
AGENTS.md, CLAUDE.md |
Compact host instructions that point to the manuals |
setup/ |
Install/bootstrap scripts and the agent manual SETUP.md |
.agents/skills, .claude/skills |
Junction surfaces to external skill packages |
kaizen_components/ |
The shared engine package behind kaizen.py |
support_scripts/ |
Auxiliary helper scripts; scratch belongs under AI/ |
The important split is simple:
Kaizen System = the method.
Kaizen harness = this repo's local implementation.
Kaizen DB = the durable record store for managed work data.
Markdown = public docs, command stubs, generated views, or exported reports.
- Shareable system documents for agentic coding workflows.
- A local data plane backed by a direct-file Turso/libSQL-compatible database. SQL go Brrrrrrrr
- A single CLI entrypoint,
kaizen.py, for structured writes and reports. - Command families for tasks, plans, ledgers, proofs, evals, source locks, artifacts, IRL Review, anti-patterns, learning records, evidence ingestion, activity traces and eval scores, and the improvement lab.
- Project and skill
evals/surfaces for command stubs and portable eval fixtures. - VS Code project-shape guidance for Codex, Claude Code, and similar coding agents.
- Deterministic scripts that move repetitive mechanics out of the model context window.
Download the single installer for your platform and run it. On a bare machine it installs the prerequisites (git + Python) for you, clones this repo, builds the shared venv, generates the VS Code workspace and launcher, scaffolds an empty sibling skills store, and initializes the local DB. Everything lives under a parent folder you choose, called DEVROOT:
DEVROOT/
|-- agent-kaizen/ the cloned repo
|-- SKILLS/ sibling skills store (empty by default)
`-- Python/venvs/kaizen/ shared Python venv
Windows — download Install-Agent-Kaizen.cmd and double-click it (it installs git + Python via winget, user scope — no admin):
curl.exe -L -o Install-Agent-Kaizen.cmd https://raw.githubusercontent.com/LevyBytes/agent-kaizen/main/setup/Install-Agent-Kaizen.cmd
.\Install-Agent-Kaizen.cmdA downloaded .cmd carries the "mark of the web", so SmartScreen may warn the first time — choose More info → Run anyway (or right-click the file → Properties → Unblock).
Linux / macOS — download install-agent-kaizen.sh and run it (it installs git + Python 3 via your system package manager — apt/dnf/yum/pacman/zypper, or Homebrew on macOS; sudo where needed):
curl -L -o install-agent-kaizen.sh https://raw.githubusercontent.com/LevyBytes/agent-kaizen/main/setup/install-agent-kaizen.sh
bash install-agent-kaizen.shRe-running is safe. If no package manager is available (older Windows without winget; an unsupported Linux distro), the installer prints the official Git and Python links — install those (tick "Add to PATH") and re-run. Already have the repo cloned? On Linux/macOS run bash setup/setup.sh [DEVROOT]; on Windows run setup\Install-Agent-Kaizen.cmd, which detects an existing clone and skips re-cloning.
Skills ship empty; add a store of your own with setup/link-skills.ps1 or setup/link-skills.sh. The local policy DB also starts empty by design — add your own rules with kaizen.py X1 and load them with X5; nothing is seeded for you.
This repo is developed on Windows and PowerShell, but the core Python commands are ordinary Python and can be adapted to other shells.
Prerequisites:
- Python 3.12 with
venvsupport. - A VS Code checkout of this repository.
The installer uses a shared venv at $DEVROOT/Python/venvs/kaizen; the manual steps below use a repo-local .venv fallback, which works the same way.
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements-kaizen.txt
.\.venv\Scripts\python.exe kaizen.py K1 --json
.\.venv\Scripts\python.exe kaizen.py X5 --json
.\.venv\Scripts\python.exe kaizen.py --helppython3 -m venv .venv
./.venv/bin/python -m pip install -r requirements-kaizen.txt
./.venv/bin/python kaizen.py K1 --json
./.venv/bin/python kaizen.py X5 --json
./.venv/bin/python kaizen.py --helpK1 checks or initializes the DB. X5 loads private session policy context. --help shows the
current command surface.
Default local DB shape:
AI/db/
|-- kaizen.db
|-- exports/
|-- manifests/
`-- backups/
For public repositories, keep AI/db/ contents private/local unless a report or export has been
deliberately sanitized.
Markdown in this repo is formatted with Prettier settings
proseWrap: preserve and printWidth: 100 (the config is kept local, not shipped). Prettier is
optional but recommended: it is not a required gate (no CI enforcement, and you do not need it
to use the harness), but if it is available it keeps docs consistently formatted.
npx prettier --check path/to/file.md # report formatting drift
npx prettier --write path/to/file.md # apply formattingThe harness ships with a standard-library unittest suite under AI/tests/. Each test
runs the CLI against a throwaway database (an isolated KAIZEN_REPO_ROOT temp directory), so it
never reads or writes your real AI/db/. Run it with the project venv's Python:
.\.venv\Scripts\python.exe -m unittest discover -s AI/tests./.venv/bin/python -m unittest discover -s AI/testsSee AI/tests/README.md for what each module covers.
For substantial work have your agent:
-
Load private policy context:
python kaizen.py X5 --json -
Check or initialize the DB:
python kaizen.py K1 --json -
Scope the task with evidence, assumptions, boundaries, and acceptance criteria. Agent should ask the user many questions until the scope layer is fully defined and free of any ambiguity.
-
Adapt through bounded changes and deterministic scripts where practical.
-
Verify with ground truth first, then structured review where judgment is needed.
-
Manage records, artifacts, hashes, proofs, source locks, and reports through the CLI.
-
Improve by promoting useful lessons into GOTCHA, LEARNING, LEARNED, evals, docs, or scripts.
Before a major task or after a compacted conversation, reload policy context with X5.
This is a small representative flow. Copy returned IDs into later commands where placeholder tokens appear.
python kaizen.py K1 --json
python kaizen.py X5 --json
python kaizen.py W1 --title "README polish" --summary "Rewrite the README as a stronger public entry point." --body "Use SAVMI framing, setup steps, command index, and public safety guidance." --json
python kaizen.py Q2 --task-id TASK_ID_FROM_W1 --conclusion VERIFIED_ACCEPTABLE --summary "README checks passed." --body "Formatter, stale-term scan, command-index check, and link check completed." --json
python kaizen.py G1 --title "README command drift" --summary "Command tables can drift from the CLI alias map." --body "Regenerate or verify the table against kaizen_components/args.py before publishing." --json
python kaizen.py L2 --id GOTCHA_ID_FROM_G1 --json
python kaizen.py R2 --limit 20 --jsonFor JSON-heavy payloads, prefer --payload-json-file, --summary-file, or --body-file when shell
quoting becomes awkward.
Short codes and named aliases are equivalent. Short codes are compact for agents; aliases are easier for silly humans. Run python kaizen.py --help for current arguments and examples.
| Code | Alias | Purpose |
|---|---|---|
K1 |
db-check |
Check or initialize the DB |
K2 |
schema-status |
Show schema status |
K3 |
db-backup |
Back up DB files |
K6 |
db-manifest |
Export a DB manifest |
W1 |
task-start |
Create a task record |
W2 |
task-update |
Add a ledger/status update |
W3 |
plan-create |
Create a plan record |
W4 |
plan-revise |
Revise a plan record |
W5 |
subagent-packet-create |
Create a subagent packet |
W6 |
subagent-packet-ingest |
Ingest a subagent packet |
W7 |
diagnostic-packet-create |
Create a diagnostic packet |
W8 |
diagnostic-result-ingest |
Ingest a diagnostic result |
G1 |
gotcha-add |
Add a GOTCHA record |
G2 |
gotcha-list |
List GOTCHA records |
G3 |
gotcha-query |
Query GOTCHA records |
G4 |
gotcha-inspect |
Inspect a GOTCHA record |
L1 |
learning-add |
Add a LEARNING record |
L2 |
promote-gotcha-learning |
Promote GOTCHA to LEARNING |
L3 |
promote-learning-learned |
Promote LEARNING to LEARNED |
L4 |
learning-list |
List LEARNING records |
L5 |
learning-query |
Query LEARNING records |
L6 |
learning-inspect |
Inspect a LEARNING record |
L7 |
learned-list |
List LEARNED records |
L8 |
learned-query |
Query LEARNED records |
L9 |
learned-inspect |
Inspect a LEARNED record |
Q1 |
proof-add |
Record proof metadata |
Q2 |
verification-add |
Add a verification result |
Q3 |
eval-case-add |
Add an eval case |
Q4 |
eval-run-add |
Record an eval run |
Q5 |
anti-pattern-add |
Add an anti-pattern record |
Q6 |
anti-pattern-query |
Query anti-pattern records |
Q7 |
quality-inspect |
Inspect proof, eval, or quality record |
Q8 |
output-validate |
Validate a payload against its schema |
M1 |
migration-scan |
Scan learning surfaces |
M2 |
migration-dry-run |
Preview migration actions |
M3 |
migration-apply |
Apply migration actions |
M4 |
migration-verify |
Verify migrated surfaces |
M5 |
migration-report |
Report migration state |
R1 |
task-report |
Generate a task report |
R2 |
ledger-report |
Generate a ledger report |
R3 |
learning-report |
Generate a learning report |
R4 |
proof-report |
Generate a proof report |
R5 |
eval-report |
Generate an eval report |
R6 |
source-report |
Generate a source report |
R7 |
anti-pattern-report |
Generate an anti-pattern report |
R8 |
weekly-report |
Generate a weekly report |
R9 |
monthly-report |
Generate a monthly report |
R10 |
yearly-report |
Generate a yearly report |
R11 |
topic-report |
Generate a topic report |
S1 |
source-add |
Add a source lock |
S2 |
source-query |
Query source locks |
S3 |
source-inspect |
Inspect a source lock |
S4 |
source-export |
Export source locks |
I1 |
irl-create |
Create an IRL Review record |
I2 |
irl-prediction-add |
Add an IRL Review prediction |
I3 |
irl-correction-add |
Add a user correction |
I4 |
irl-outcome-add |
Add an observed outcome |
I5 |
irl-report |
Generate an IRL Review report |
A1 |
artifact-add |
Add an artifact reference |
A2 |
artifact-hash |
Hash a file |
A3 |
artifact-inspect |
Inspect an artifact |
A4 |
artifact-list |
List or query artifacts |
A5 |
artifact-verify |
Verify an artifact hash |
X1 |
policy-add |
Add private policy context |
X2 |
policy-list |
List private policy records |
X3 |
policy-query |
Query private policy records |
X4 |
policy-inspect |
Inspect a private policy record |
X5 |
policy-session-context |
Load session policy context |
E1 |
evidence-ingest-file |
Ingest a file into the evidence plane |
E3 |
evidence-chunk |
Chunk an evidence document |
E4 |
evidence-query |
Search evidence chunks |
E5 |
evidence-inspect |
Inspect a document, block, or chunk |
T1 |
trace-add |
Record a trace event |
T2 |
score-add |
Record an eval score |
T3 |
trace-report |
Generate a trace report |
O1 |
lab-assemble |
Assemble an improvement-lab case set |
O2 |
lab-propose |
Record an improvement proposal |
O3 |
lab-report |
Rank and report improvement proposals |
Y1 |
comfy-run |
Run + record a ComfyUI workflow |
Y2 |
comfy-inspect |
Inspect one generative run |
Y3 |
comfy-list |
List recent generative runs |
Y4 |
comfy-replay |
Re-submit a prior run's workflow |
Y5 |
comfy-doctor |
Probe the configured ComfyUI endpoint |
B1 |
model-doctor |
Probe configured model backends |
B2 |
model-run |
Advisory text via the LLM backend |
B3 |
reembed |
Backfill evidence-chunk embeddings |
The current harness uses Turso Database through Python direct local file access with pyturso. The local DB path is AI/db/kaizen.db.
The implementation uses local DB files, MVCC mode, bounded retry behavior, app-generated IDs, and SHA-256 hashes for entries and artifacts where practical. The concept is backend-agnostic: another project can use a different database or remote service as long as records stay structured, queryable, and written through deterministic paths.
See support_scripts/README.md for script-level details.
You can use this repo in two ways.
Work inside this repository, keep the local DB private, and use kaizen.py to manage
tasks, proofs, evals, learning records, reports, and policy context.
For another VS Code project, start with this minimal shape:
repo/
|-- AGENTS.md
|-- CLAUDE.md
|-- Kaizen_System.md
|-- kaizen.py
|-- kaizen_components/
|-- requirements-kaizen.txt
|-- setup/
| `-- SETUP.md
|-- AI/
| |-- db/
| |-- work/
| `-- generation/
`-- evals/
|-- GOTCHA.md
|-- LEARNING.md
`-- LEARNED.md
Optional surfaces such as prompts, custom agents, MCP config, recipes, schemas, and reports can be added when a project needs them. Keep the first version small; add structure when it removes real friction.
The Kaizen engine (kaizen_components/) is identical in every project, so the best way to use it across several repos is to link it, not copy it. Linked projects all run the one master engine, so a fix or a new command made while working in any project lands in this repo and improves every project over time — instead of N drifting copies you have to reconcile by hand.
Link the engine, keep your launcher local. Replace the using project's kaizen_components/ with a junction/symlink to this repo's copy; leave kaizen.py (a tiny launcher) and any project-specific helpers as ordinary local files:
# Windows (directory junction; no admin needed)
cmd /c mklink /J "<project>\kaizen_components" "<DEVROOT>\agent-kaizen\kaizen_components"# Linux / macOS
ln -s "<DEVROOT>/agent-kaizen/kaizen_components" "<project>/kaizen_components"Keep each project's data plane separate (this is the important part). paths.py anchors the whole data plane — kaizen.db, work/, exports/ — on REPO_ROOT, which it resolves from the engine's own location. Because a link resolves back to this repo, a naive link would make every project write to this repo's kaizen.db. Set KAIZEN_REPO_ROOT to the using project's root so the linked engine keeps its data local. The cleanest place is the project's own (local, un-linked) kaizen.py, which pins it before importing the engine:
import os
from pathlib import Path
os.environ.setdefault("KAIZEN_REPO_ROOT", str(Path(__file__).resolve().parents[0]))
from kaizen_components.args import main # import AFTER pinning the data-plane rootsetdefault means an explicitly pre-set KAIZEN_REPO_ROOT still wins — so the default is per-project data isolation, and deliberately sharing one data plane across projects stays available as an opt-in edge case. Net result: one shared engine that improves for everyone, with separate kaizen.db records per project unless you choose otherwise. (Note: junctions/symlinks don't version cleanly — recreate the link from a setup step rather than committing it.)
Skills are maintained outside this repo and surfaced through .agents/skills and .claude/skills junctions.
Edit the canonical skill store, not a duplicate mirror. Every skill should have an evals/ surface for command stubs and behavioral eval fixtures.
Agentgateway is not required for the local single-user harness. It becomes useful later when a project needs centralized identity, RBAC, remote MCP/tool federation, model routing, budgets, rate limits, failover, or auditable traces across multiple agents, users, services, or machines.
The Kaizen DB includes compatible event storage so gateway integration can be added later without changing the core record model.
The optional, capability-activated backend layer has landed — nothing bloats the dependency-light core; each stays off until you install or point at it:
- Ollama — ✅ shipped (
B*/model-*): an optional local-or-remote OpenAI-compatible backend. Embeddings light upE3(chunk embeddings) andE4 --semantic(Turso-native vector search);model-runadds advisory text. Opt-in viaKAIZEN_EMBED_MODEL/KAIZEN_LLM_MODEL; advisory only — never the final acceptance authority unless a deterministic verifier backs the call. Seesetup/OLLAMA.md. - ComfyUI — ✅ shipped (
Y*/comfy-*): an optional local generative-workflow backend (image/diffusion and node-graph pipelines). The agent authors the workflow JSON; durable runs are recorded asgenerative_runs+ artifacts + traces, replayable by graph hash + seed. Start it persetup/COMFYUI.md. - PyTorch / sentence-transformers — ✅ shipped (opt-in extra): an in-process sentence-transformers embedding backend (
KAIZEN_EMBED_BACKEND=sentence-transformers, no server) plus an embedder-backedsemanticchunker forE3. Installrequirements-pytorch.txtand it lights up E3/E4 vector search; skip it and the deterministic recursive chunker + lexical search still cover the base case. Seesetup/PYTORCH.md. (Localtransformerstext generation is deferred — use Ollama for local advisory text.)
Each lands as its own increment behind an explicit execution contract, and the data plane already reserves room for the routing and budget events they will emit.
Treat generated DB data, reports, local policy context, artifacts, and exports as private by default. Before publishing a public repo, inspect:
- tracked files;
- ignored files that may later be force-added;
- generated reports and DB exports;
- artifact references and screenshots;
- personal paths, machine names, tokens, credentials, and secret-like strings.
Public tracked docs should explain the portable system and local harness, not private machine policy or user-specific operational constraints.
This repo is AGPL-3.0 licensed. See LICENSE.
