Autonomous code factory. Issues go in, PRs come out.
- Picks the next task file from
tasks/ - Routes to the repo (local, GitHub, or creates a new one)
- Claude codes it, runs tests, commits, opens a PR
- Moves the task file to
tasks/done/
Shipyard expects your repos to live in the parent directory:
projects/
shipyard/ # this repo
my-app/ # a repo shipyard can work on
another-app/ # another repo
If a repo isn't local, Shipyard searches your GitHub account and clones it automatically.
Or set SHIPYARD_PROJECTS to point elsewhere:
export SHIPYARD_PROJECTS="$HOME/code"./factory.sh # run one task (Claude by default)
./factory.sh --dry-run # preview what it would pick
./factory.sh --parallel 3 # run 3 tasks in parallel
./factory.sh --issues owner/repo # pull GitHub issues into tasks/
./factory.sh --verify owner/repo # re-verify all open PRs
./factory.sh --verify owner/repo 42 # re-verify a specific PR
SHIPYARD_AGENT=dotbot ./factory.sh # use dotbot instead of Claude
SHIPYARD_AGENT=dotbot SHIPYARD_PROVIDER=anthropic ./factory.sh # dotbot + AnthropicRun in its own terminal — not inside another tool. Monitor progress in a second terminal:
tail -f logs/*.logCancel anytime with Ctrl+C — Shipyard cleans up the branch and returns to the default branch.
Each task is a markdown file in tasks/. The filename becomes the task name. The file body is the full prompt sent to Claude — write as much or as little as you need.
Existing repo — add repo: in frontmatter:
---
repo: my-app
---
Add a dark mode toggle to the settings page. Should respect system
preference by default. Use the existing ThemeProvider context.Screenshot verification — if agent-browser is installed and the project has a dev/start/preview script, Shipyard starts the dev server after shipping, reads the git diff to figure out which pages were affected, and uses Claude + agent-browser to take targeted screenshots of the changes. Screenshots are committed to the branch and commented on the PR.
New repo — omit repo: and Shipyard creates one (named from the filename):
Build a weather dashboard that shows 5-day forecast.
Use OpenWeatherMap API. Include a search bar for city lookup.Tasks run in alphabetical order by filename. Prefix with numbers to control priority:
tasks/
01-fix-auth.md ← runs first
02-add-dashboard.md
03-refactor-api.md
Completed tasks move to tasks/done/.
Pull issues from any repo into your task queue:
./factory.sh --issues owner/repoThis fetches open issues labeled shipyard and creates task files from them. After the factory completes a task, it comments the PR link on the issue and closes it.
Shipyard is designed to run unattended. Point cron at it and your issues get solved while you sleep.
Every hour — process one task from the queue:
0 * * * * /path/to/shipyard/factory.sh >> /path/to/shipyard/shipyard.log 2>&1Every hour — pull new GitHub issues, then process them:
0 * * * * /path/to/shipyard/factory.sh --issues owner/repo >> /path/to/shipyard/shipyard.log 2>&1Nightly batch — run 5 tasks in parallel at 2am:
0 2 * * * /path/to/shipyard/factory.sh --parallel 5 >> /path/to/shipyard/shipyard.log 2>&1Label a GitHub issue shipyard, go to bed, wake up to a PR with screenshots. That's the workflow.
Based on patterns from Ramp Inspect and Stripe Minions:
- Task queue —
tasks/folder, one markdown file per task - Task routing — finds repo locally, clones from GitHub, or creates new
- Branch isolation — agents work on feature branches, never the default branch
- Autonomous coding — agent runs non-interactively (Claude or dotbot)
- Test verification — run tests, fail fast if broken
- PR creation — open a PR via
ghCLI for every task - CI gate — auto-generates GitHub Actions workflow, watches CI, fixes failures
- Task completion — move task file to
tasks/done/ - Visual verification — targeted screenshots of changes via agent-browser
- Streaming output — real-time Claude session output via stream-json
- Parallel execution — run multiple tasks concurrently with
--parallel N - Logging — timestamped logs per run for debugging
- Scheduling — cron or trigger to run without you
A single file controls the factory: factory.md at the repo root. A Dockerfile for code factories — all the standards an autonomous agent needs to ship code in a repo, in one file you can clone and run anywhere. See the factory.md spec for the full format.
factory.md has 8 reserved sections. Each section is a bullet list of rules.
| # | Section | Covers |
|---|---|---|
| 1 | ## style |
Formatting, naming, function size, imports, changelog hygiene |
| 2 | ## build |
Runtime, package manager, CI workflow, version bumping |
| 3 | ## testing |
Test framework, pass/fail gates, new-code test requirements |
| 4 | ## documentation |
Doc comments, README, AGENTS.md updates |
| 5 | ## environment |
Dev tools, branching rules, worktrees |
| 6 | ## quality |
File size, function size, TODO/FIXME, complexity |
| 7 | ## observability |
Logging, error reporting, tracing |
| 8 | ## security |
Hardcoded credentials, dangerous patterns, dependencies |
Every bullet is one rule. The framework reads each bullet and either:
- Runs it as a gate if it recognizes the rule (e.g. "no secrets in diff", "tests pass")
- Forwards it to the agent as an additional rule to honor if it doesn't recognize it
Prefix a bullet with ! to mark it strict — the framework must verify it deterministically or the pipeline fails. Use strict for security, correctness, and release-critical rules you refuse to trust a model on:
## security
- ! No hardcoded credentials
- ! No eval
- Dependency audit cleanEdit any section to match your preferences. factory.md is framework-agnostic — the same file can drive any autonomous agent pipeline, not just Shipyard.
Shipyard's pipeline is an implementation detail of factory.sh:
- Pick the next task from
tasks/ - Route it to a repo (local, GitHub, or new)
- Prepare a feature branch (worktree)
- Scaffold a CI workflow if missing
- Run the agent with every
factory.mdrule injected into the prompt - Dispatch every rule bullet through
check_gate; recognized gates run as checks, unrecognized gates are forwarded - Fix gate failures by re-engaging the agent (max 2 attempts)
- Confirm the PR, watch CI, fix failures (max 2 attempts)
- Screenshot affected pages via agent-browser
- Move the task file to
tasks/done/, close the issue, return to the default branch
- Claude Code or dotbot
ghCLI (authenticated)agent-browser(optional, for screenshot verification)
Shipyard does the same thing as GitHub Copilot Coding Agent and Claude for GitHub — task in, PR out, automated. The difference is it's a shell script you own.
- Task queue with priority — file-based, numbered for order, not one-off prompts
- Configurable standards and workflow — edit
factory.md(a portable, framework-agnostic spec) to control exactly what the agent does - Screenshot verification — starts the dev server, reads the diff, screenshots the actual pages that changed
- Runs locally — no data leaves your machine except API calls
- Swappable agent — Claude Code or dotbot (any provider: xAI, Anthropic, OpenAI, Ollama)
- GitHub issues integration — pull labeled issues into the queue, close them on completion
- No vendor lock-in — swap Claude for another model, change the pipeline, fork it
- Hosted infrastructure (no local machine needed)
- Web UI
- No setup
Developers who want to own their code factory. Same idea as self-hosting vs SaaS — you trade convenience for control.
git clone https://github.com/stevederico/shipyard
cd shipyardEdit factory.sh or factory.md. Open a PR.
MIT — see LICENSE.