The missing layer between repo instructions and CI. agent-redline teaches coding agents when to slow down, then verifies the same structural-risk policy in the PR.
Repo instructions (AGENTS.md, CLAUDE.md) are passive — agents drift. CI checks (ArchUnit, dependency rules) fire after the fact. agent-redline makes architectural risk binding for the agent before it edits, then checks it deterministically at PR time.
- Let agents move fast on low-risk code
- Catch dependency-boundary violations before they become architecture drift
- Flag PRs that touch APIs, persistence, security, or structural contracts
- Route human review to the changes where "tests pass" isn't enough
It does not review every line of generated code. It does not enforce style. It does not replace tests.
It identifies the small set of changes — modeling, contracts, boundaries, persistence, security — where local correctness is not enough, and makes those changes deterministically visible to humans.
agent-redline is a skill for AI coding agents. Drop it into your harness (Claude Code, Codex, Cursor, etc.); it activates automatically when a repo contains an agent-policy.yaml. The skill teaches the agent to classify each change before editing, slow down on the structurally-consequential ones, and refuse to work around boundary rules. Deterministic CI checks (a boundary-rule backend like ArchUnit, OpenAPI diff, path classification) catch what the agent missed.
Blue zone agent autonomy is fine; tests and normal review are sufficient
Red zone agent must slow down; human attention required (a checkpoint)
Gray zone unclassified path; cautious by default; a tuning signal
Watch (tag) additive — surfaces a touched path in the PR comment, no gate
Boundary rule a deterministic dependency rule the agent may not work around
A checkpoint is satisfied by a CODEOWNER approval or a label (architecture-reviewed, api-reviewed, etc.). The reporter is deterministic; humans review only the small set of changes that actually need it.
agent-redline is a harness component, not a complete harness. It composes with existing architecture tools (ArchUnit, dependency-cruiser, Import Linter), instruction files (AGENTS.md, CLAUDE.md), and AI review tools — it doesn't replace them. See docs/FAQ.md for detailed comparisons.
The novel piece isn't the rules — it's that the agent treats the rules as binding before it edits, not as suggestions to route around. In paired simulation runs on shortcut-tempting tasks, the with-skill agent refused both the canonical boundary bypass and a tempting weakening of the architecture test. The without-skill agent took both. Deterministic CI then catches whatever the skill misses.
LLMs increase the rate of code production sharply. Human review capacity does not scale with it. Most code agents produce is low-consequence: tests, isolated adapters, mappers, internal utilities. A minority is structurally consequential: a controller that defines a public contract, a domain class that defines an invariant, a repository import that breaks a port boundary, a migration that reshapes persistence.
Tests check behavior. They do not check structure. A feature can work, pass tests, ship in a clean small PR — and still leave the architecture worse, because the next agent will copy the shortcut. agent-redline closes that gap.
Bootstrap mode. In a fresh repo:
Use agent-redline to set up governance for this repo.
The skill inspects the repo's layout, build system, conventions, and existing CI, then:
- Generates
agent-policy.yaml— the repo's red/blue/gray zones and boundary rules - Generates or composes with the existing agent-instruction file (
AGENTS.md,CLAUDE.md,GEMINI.md, etc.) - Scaffolds a boundary-rule backend matching the chosen language extension
- Drops a local pre-push script that mirrors what CI will check
- Proposes (does not commit) a CI workflow, branch protection updates, and CODEOWNERS additions for human review
Operating mode. Activates whenever an agent works in a repo that has agent-policy.yaml. The agent:
- Reads the policy before editing
- Classifies the intended change as blue / red / gray / boundary-violating
- Works autonomously when the change is blue
- Slows down and writes a checkpoint note when the change is red
- Refuses to work around boundary rules; fixes the structure or escalates instead
- Runs the local check before opening a PR
After CI runs, the reporter posts a single sticky comment summarizing the verdict. Real example from the demo's mixed fixture:
## agent-redline: RED
**Red-zone files changed.**
| Zone | Files |
|---|---|
| Red | `src/main/java/com/example/orders/domain/Order.java` |
| Blue | `src/test/java/com/example/orders/OrderServiceTest.java` |
| Gray | `src/main/java/com/example/orders/util/DateNormalizer.java` |
**Required checkpoints:**
- [ ] `architecture-review` — red-zone change: src/main/java/com/example/orders/domain/Order.java. Satisfy by: CODEOWNER approval or label `architecture-reviewed`
**Boundary check:** passed
**API check:** no changes
**PR size:** 3 files / 0 lines (ok)A boundary violation looks the same shape but with the Boundary check line listing the violated rule and the failing class — and CI exits non-zero so the PR cannot merge.
See the live demo PRs for the three canonical states. Each sync rotates the PR numbers; the latest open PR for each branch is what to look at.
JVM/Spring — agent-redline-demo:
demo/blue-only-pr— BLUE, green CI, no checkpointdemo/red-with-checkpoint-pr— RED, green CI,architecture-reviewedlabel applied → checkpoint satisfieddemo/boundary-violation-pr— BOUNDARY_VIOLATION, red CI, cannot merge
Python/FastAPI — agent-redline-python-demo:
PR-driven flow (sticky comment + label-satisfied checkpoints):
demo/blue-only-pr— BLUEdemo/red-with-checkpoint-pr— RED with checkpoint satisfieddemo/boundary-violation-pr— BOUNDARY_VIOLATION
Push-driven flow (verdict in run-page $GITHUB_STEP_SUMMARY; agent-redline workflow fails on EXIT != 0 so GitHub's default email-on-failure fires; the workflow ships as its own file and does not gate other CI):
push-demo-blue-only— BLUE, workflow green, no emailpush-demo-red-zone-change— RED, workflow red, email firespush-demo-boundary-violation— BOUNDARY_VIOLATION, workflow red, email fires
| Stack | Extension | Boundary backend | Demo |
|---|---|---|---|
| JVM (Java, Kotlin) — generic + Spring addendum | jvm-archunit |
ArchUnit (JUnit XML) | agent-redline-demo |
| Python services and libraries (incl. Django) | python |
import-linter (json-violations) | agent-redline-python-demo |
The framework's stack-neutral pieces — zone classification, checkpoints, PR-size checks, the agent-side discipline — work on any repo. The boundary-rule backend is the ecosystem-specific piece, and is what each language extension brings.
A language extension is a small folder of mostly markdown:
extensions/<your-stack>/
├── README.md # what stack, when to pick it
├── profile.md # default zones, boundary contracts, gotchas
├── scaffold.md # how bootstrap installs the backend and wires CI
├── operating.md # (optional) stack-specific operating-mode notes
├── adapter.yaml # tells the reporter where the backend writes its
│ # output and what format
└── scripts/ # (optional) adapter script when the backend has no
# machine-readable output (the Python extension uses one)
The reporter dispatches on adapter.yaml's outputFormat — junit-xml, json-violations, or none. Any backend that produces JUnit XML, matches the json-violations schema, or has a small adapter that converts, plugs in without core changes.
Recommended backends for stacks not yet shipped: dependency-cruiser for Node, go-arch-lint for Go, cargo-deny + Clippy for Rust, Semgrep as a multi-language fallback. See docs/EXTENSIONS.md for the practical guide and docs/SPEC.md §15.3 for the broader roadmap.
Drop the packaged skill at dist/agent-redline/ into your harness's skills directory. agent-redline follows the Agent Skills standard.
Quick start (Claude Code, personal scope):
git clone https://github.com/rore/agent-redline.git
cp -r agent-redline/dist/agent-redline ~/.claude/skills/Other tools and project-scope installs: see INSTALL.md.
| If you want to… | Read |
|---|---|
| Understand the why | docs/PHILOSOPHY.md |
| Install and try it | INSTALL.md |
| See the bootstrap conversation in detail | docs/BOOTSTRAP.md |
| See operating-mode behavior | docs/OPERATING.md |
| Read the policy schema | docs/POLICY_SCHEMA.md |
| Build a language extension | docs/EXTENSIONS.md |
| Wire it into CI | docs/CI_INTEGRATION.md |
| Read the full spec | docs/SPEC.md |
| See common questions | docs/FAQ.md |
v0.2. Early. Interfaces stable but not guaranteed across versions.
Two flow modes for CI integration: PR-driven (sticky-comment surface, fail CI on exit 2) and push-driven (run-page $GITHUB_STEP_SUMMARY surface, fail the agent-redline workflow on EXIT != 0 so GitHub's default email-on-failure fires for both RED warnings and BOUNDARY_VIOLATION hard fails). The agent-redline workflow ships as its own .github/workflows/ file in either mode — its failure does not affect other workflows in the repo. Bootstrap picks the flow mode based on the repo's actual flow.
Decisions and their rationale: docs/DECISIONS.md. Roadmap: docs/SPEC.md §15.3.