Agent Workbench is a local-first IDE/runtime for coding agents. It exposes repo-scoped code intelligence, documentation routing, bounded edit support, diagnostics, validation planning, workspace safety, and capability/freshness metadata through MCP, so agents can rely on mature software-engineering evidence instead of broad file reads, ad hoc shell scans, and unsupported inference.
Agent Workbench does not replace coding agents. It gives coding agents an IDE-grade evidence layer.
Coding agents are strongest when they can spend context on design decisions and edits instead of rediscovering repository structure. Agent Workbench provides bounded, repo-scoped evidence for common questions:
| Agent problem | Mature tool class | Workbench role |
|---|---|---|
| Where is this defined? | Parser/index/symbol graph | Symbol search and context routing |
| What uses this? | Reference engine | References with confidence and provenance |
| What might break? | Impact graph/test mapping | Bounded impact and validation planning |
| Is this file valid? | Parser/linter/type checker | Diagnostics and planned checks |
| What should I test? | Test discovery/dependency graph | Verification plan |
| Can I safely edit this? | Workspace safety/edit preview | Preview/apply with drift checks |
| Is this generated/vendor/secret? | Scope/catalog policy | Refusal, caveats, and redaction |
| Where are the docs? | Markdown index/outline/FTS | Docs routing and section reads |
Agents should not spend context and time rediscovering what mature coding support tools can already answer deterministically or semi-deterministically.
The public runtime surface is MCP-first:
repo:///status,repo:///scope, andrepo:///overviewfor first-read repo state, scope, freshness, and capability coverage.- Documentation resources and tools for bounded docs overview, map, search, outline, and section reads.
context_for_taskfor bounded task routing before broad file reads.symbol_search,find_references, andimpactfor targeted code evidence.diagnostics_for_filesandverification_planfor read-only diagnostics and planned validation.preview_workspace_editandapply_workspace_editfor bounded writes with preview tokens, path containment, and drift checks.- Integration health/profile resources for configured, discovered, callable, unavailable, blocked, hidden, and unknown agent surfaces.
Workbench responses carry metadata so agents can calibrate claims:
- Capability levels are
semantic,partial_semantic,resource_backed, orunsupported. - Freshness is
fresh,stale,cold,refreshing, orunknown. - Evidence kinds include parser, docs, FTS, config, direct reads, heuristics, text fallback, and executed commands.
- Verification status distinguishes
done,planned,needed,blocked, andnot_applicable.
Routing evidence helps an agent decide where to look. Parser-backed evidence supports stronger claims about declarations and syntax. Semantic evidence supports stronger claims only when fixture-proven for that language and operation. Direct source reads remain necessary when confidence is partial, degraded, stale, or heuristic. Planned validation is not completed validation; executed tests/checks or equivalent evidence are required before claiming proof.
Agent Workbench does not decide whether work is approved, complete, promoted, released, or closed. It provides repository evidence, coding support, validation planning, diagnostics, and workspace-safety contracts. Lifecycle tools, issue trackers, maintainers, or project governance remain responsible for intent, acceptance, and closure.
Workbench may consume active task or spec context when a lifecycle system
provides it. It may rank files/docs using active spec links and expose evidence
useful to lifecycle tasks. It must not require ai-spec-lifecycle or any
specific lifecycle tool, decide whether a spec is complete, promote durable
docs automatically, or close specs.
See Lifecycle bridge contract for the generic boundary.
Agent Workbench has been dogfooded on multiple repositories where coding agents used it to support feature development. Dogfood evidence should be recorded in project docs, proof matrices, or review notes rather than treated as an implicit guarantee.
Current evidence starts in:
- Dogfood evidence ledger
- MVP proof matrix
- Spec closure log
- Cross-repo smoke feedback
- Agent Workbench smoke feedback
Maintainers should add new dogfood entries to durable reference docs or proof matrices with dates, repositories, validated surfaces, limitations, and follow-up work.
repo status -> context_for_task -> source read -> preview edit
-> diagnostics -> validation plan -> report evidence
Check freshness before editing. Treat resource_backed, heuristic, or
text_fallback evidence as routing, not proof. Report validation as planned
unless checks actually ran.
lifecycle readiness packet -> lifecycle bridge context
-> bounded implementation -> diagnostics -> validation plan
-> lifecycle evidence update by the owning lifecycle system
Workbench consumes task context and returns repo evidence. The lifecycle system or maintainer remains responsible for acceptance, promotion, and closure.
changed files -> impact evidence -> diagnostics
-> validation adequacy -> residual risk report
Do not mutate files. Use impact and diagnostics as evidence, then call out stale indexes, partial semantic coverage, missing checks, and residual risk.
Use pnpm for local development:
pnpm install
pnpm rebuild:native
pnpm typecheck
pnpm test
pnpm dev -- <repo-root>Native tree-sitter bindings may require pnpm rebuild:native under newer Node
versions. Do not add parser fallbacks to mask install/build issues.
Start with Documentation map for the canonical owner of each design, contract, proof, integration, and safety topic.
Agent-visible behavior changes are tracked in Agent-readable changelog.