Agent Workbench

Agent Workbench is a local-first IDE/runtime for coding agents. It exposes repo-scoped code intelligence, documentation routing, bounded edit support, diagnostics, validation planning, workspace safety, and capability/freshness metadata through MCP, so agents can rely on mature software-engineering evidence instead of broad file reads, ad hoc shell scans, and unsupported inference.

Agent Workbench does not replace coding agents. It gives coding agents an IDE-grade evidence layer.

What It Solves

Coding agents are strongest when they can spend context on design decisions and edits instead of rediscovering repository structure. Agent Workbench provides bounded, repo-scoped evidence for common questions:

Agent problem	Mature tool class	Workbench role
Where is this defined?	Parser/index/symbol graph	Symbol search and context routing
What uses this?	Reference engine	References with confidence and provenance
What might break?	Impact graph/test mapping	Bounded impact and validation planning
Is this file valid?	Parser/linter/type checker	Diagnostics and planned checks
What should I test?	Test discovery/dependency graph	Verification plan
Can I safely edit this?	Workspace safety/edit preview	Preview/apply with drift checks
Is this generated/vendor/secret?	Scope/catalog policy	Refusal, caveats, and redaction
Where are the docs?	Markdown index/outline/FTS	Docs routing and section reads

Agents should not spend context and time rediscovering what mature coding support tools can already answer deterministically or semi-deterministically.

What It Exposes

The public runtime surface is MCP-first:

repo:///status, repo:///scope, and repo:///overview for first-read repo state, scope, freshness, and capability coverage.
Documentation resources and tools for bounded docs overview, map, search, outline, and section reads.
context_for_task for bounded task routing before broad file reads.
symbol_search, find_references, and impact for targeted code evidence.
diagnostics_for_files and verification_plan for read-only diagnostics and planned validation.
preview_workspace_edit and apply_workspace_edit for bounded writes with preview tokens, path containment, and drift checks.
Integration health/profile resources for configured, discovered, callable, unavailable, blocked, hidden, and unknown agent surfaces.

Evidence You Can Rely On

Workbench responses carry metadata so agents can calibrate claims:

Capability levels are semantic, partial_semantic, resource_backed, or unsupported.
Freshness is fresh, stale, cold, refreshing, or unknown.
Evidence kinds include parser, docs, FTS, config, direct reads, heuristics, text fallback, and executed commands.
Verification status distinguishes done, planned, needed, blocked, and not_applicable.

Routing evidence helps an agent decide where to look. Parser-backed evidence supports stronger claims about declarations and syntax. Semantic evidence supports stronger claims only when fixture-proven for that language and operation. Direct source reads remain necessary when confidence is partial, degraded, stale, or heuristic. Planned validation is not completed validation; executed tests/checks or equivalent evidence are required before claiming proof.

Not A Lifecycle Engine

Agent Workbench does not decide whether work is approved, complete, promoted, released, or closed. It provides repository evidence, coding support, validation planning, diagnostics, and workspace-safety contracts. Lifecycle tools, issue trackers, maintainers, or project governance remain responsible for intent, acceptance, and closure.

Workbench may consume active task or spec context when a lifecycle system provides it. It may rank files/docs using active spec links and expose evidence useful to lifecycle tasks. It must not require ai-spec-lifecycle or any specific lifecycle tool, decide whether a spec is complete, promote durable docs automatically, or close specs.

See Lifecycle bridge contract for the generic boundary.

Proven Use

Agent Workbench has been dogfooded on multiple repositories where coding agents used it to support feature development. Dogfood evidence should be recorded in project docs, proof matrices, or review notes rather than treated as an implicit guarantee.

Current evidence starts in:

Maintainers should add new dogfood entries to durable reference docs or proof matrices with dates, repositories, validated surfaces, limitations, and follow-up work.

Coding-Agent Workflows

Ad Hoc Direct Patch

repo status -> context_for_task -> source read -> preview edit
-> diagnostics -> validation plan -> report evidence

Check freshness before editing. Treat resource_backed, heuristic, or text_fallback evidence as routing, not proof. Report validation as planned unless checks actually ran.

Spec Or Lifecycle Task

lifecycle readiness packet -> lifecycle bridge context
-> bounded implementation -> diagnostics -> validation plan
-> lifecycle evidence update by the owning lifecycle system

Workbench consumes task context and returns repo evidence. The lifecycle system or maintainer remains responsible for acceptance, promotion, and closure.

Review-Only Task

changed files -> impact evidence -> diagnostics
-> validation adequacy -> residual risk report

Do not mutate files. Use impact and diagnostics as evidence, then call out stale indexes, partial semantic coverage, missing checks, and residual risk.

Development

Use pnpm for local development:

pnpm install
pnpm rebuild:native
pnpm typecheck
pnpm test
pnpm dev -- <repo-root>

Native tree-sitter bindings may require pnpm rebuild:native under newer Node versions. Do not add parser fallbacks to mask install/build issues.

Documentation Map

Start with Documentation map for the canonical owner of each design, contract, proof, integration, and safety topic.

Agent-visible behavior changes are tracked in Agent-readable changelog.

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.agents/plugins		.agents/plugins
.github/workflows		.github/workflows
.kiro/hooks		.kiro/hooks
.well-known/mcp		.well-known/mcp
docs		docs
packaging/agent-workbench		packaging/agent-workbench
plugins/agent-workbench		plugins/agent-workbench
scripts		scripts
src		src
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent Workbench

What It Solves

What It Exposes

Evidence You Can Rely On

Not A Lifecycle Engine

Proven Use

Coding-Agent Workflows

Ad Hoc Direct Patch

Spec Or Lifecycle Task

Review-Only Task

Development

Documentation Map

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agent Workbench

What It Solves

What It Exposes

Evidence You Can Rely On

Not A Lifecycle Engine

Proven Use

Coding-Agent Workflows

Ad Hoc Direct Patch

Spec Or Lifecycle Task

Review-Only Task

Development

Documentation Map

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages