A research agent harness. Many AI agents working the way a research group does: arranged as panes, scaled into isolated swarms, and checked against one another until something holds.
Manifesto · Why a harness · The research arc · The machinery · Install · Plugins
anu is the harness behind michelangelo.sh, built at MIT's Quantum Photonics & AI Group (Prof. Dirk Englund). It is an agent-first terminal IDE: tmux is the window manager, Neovim is the editor, and AI agents are first-class panes. They sit beside your code as peers with their own space to think, scaled across your machines and set against one another. Built with Claude, and built for Claude to drive.
What does the future of science research look like?
Software spent the last few years building harnesses for its agents: environments where many of them work in parallel, under isolation, checking each other's work. Science mostly stopped at the chatbot. anu is the harness for what comes next.
The model isn't the binding constraint. The harness is.
A capable model alone is a chatbot. What turns it into a research group is the structure around it: how work fans out, stays isolated, and gets checked. Three principles drive the design.
- Verification is asymmetric. Confirming a result costs far less than producing it, so the harness is built to generate broadly and verify stringently: many attempts, ruthless checking.
- Parallelism is the substrate. One agent approximates a chatbot. Many isolated agents that reproduce and refute each other approximate a research group. Agents are peers with their own context, talking through explicit channels (mailboxes, broadcasts, captured panes) rather than ambient soup.
- Acting and thinking are distinct. Reasoning toward a claim and reproducing the evidence are separate operations. Run both, then admit only what survives the comparison.
That is the loop the harness keeps turning:
┌──────────────────────── refine / refute ◀────────────────┐
▼ │
goal ─▶ agents ─┬─▶ reason ───▶ claim ─────┐ │
fan out │ ├─▶ compare ─▶ knowledge
└─▶ reproduce ─▶ evidence ─┘
Only what survives the comparison becomes knowledge; everything else loops back to be refined or refuted. Research holds many control loops at once, so method is a position on a slider, from deterministic optimization at one end to genuinely open-ended reasoning at the other. The harness spans the whole range.
The goal is narrow and concrete: shorten the path from a good idea to a result that holds, scaling a group's judgment across far more attempts than hands alone allow. Scientists still do the science.
The harness ships a marketplace of plugins (Claude Code / Pi skills, developed in place). The flagship ones form one pipeline. Each stage is a workflow → fixed-template artefact in the atlas, and a decision trail records the reasoning the whole way:
find ───────────▶ understand ───────▶ do ──────────▶ show
/prior-art /study /investigate /present
/lineage (Claim·Method· (hypotheses, (served, visual:
/similar-code Result·Gap) run in a swarm) Manim · marimo)
│ ▲
└── a Gap becomes ──┘
a hypothesis
trail ── records goal · hypothesis · alternatives · outcome (git trailers)
| Stage | Command | What it does |
|---|---|---|
| find | /prior-art /lineage /similar-code |
Map the idea-space: prior work with an honest novelty read, what a paper builds on and what cites it, codebases already doing it. Retrieval is grounded (OpenAlex + GitHub), so every result is a real, fetchable source, never recalled. |
| understand | /study |
Read a new paper into a dossier: Claim (what it asserts), Method (how it was shown), Result (what was reported), Gap (what's open). The Gap feeds /investigate. |
| do | /investigate |
Run the question, don't read about it. Frame it into 2-5 falsifiable hypotheses, fan out one contained agent per hypothesis (the swarm), judge the outcomes adversarially against their evidence, record everything to the trail. |
| show | /present |
Turn a result into a served, visual presentation: pick the medium per finding (Manim animation · marimo app · static figure · served notebook), render on the right compute, reachable over Tailscale. |
| trail | trail |
The decision graph: hypotheses, choices, roads not taken, and outcomes recorded as git-commit trailers. Reconstructs from git log alone (no LLM), and renders as a B&W graph + tempo timeline with open hypotheses flagged as awaiting a verdict. |
/study arxiv.org/abs/2601.06712 # understand: a paper → dossier in the atlas
/investigate "does the apodized taper beat the baseline on insertion loss?"
# do: 3 hypotheses, one contained agent each, judged
trail # see the decision graph reconstruct from git
/present # show: render the surviving result, serve it/delve conducts understand→show in one shot; /map and /explore render a whole repo into a one-page dossier. The full set lives in plugins/: science-writing (verify-citations, peer-review, rebuttal, arxiv-prep), tikz, manim, marimo, and writing-styles.
Underneath the research arc is a general agent-first IDE. In anu, the unit of work is the agent. Everything sits in one of four layers, each composing on the one below:
SEE review · pd · map understand what exists and what changed
SCALE swarm · box · mesh many agents, isolated, across devices
ARRANGE tdl · tsl · taa · al agents as panes in space
GROUND t · tw · gwa sessions, windows, worktrees
You ground a workspace, arrange agents inside it, scale them out when one isn't enough, and see what they did:
t myproject . # ground: session rooted at the repo
swarm wt 4 cxc # scale: 4 contained agents, one per worktree
swarm collect # see: gather their outputs
review # see: AI summaries per branch
swarmx merge # merge the good onesLayout commands run inside tmux. The daily driver is tdl (dev layout): editor left, agent right, terminal bottom.
┌──────────────────────┬─────────────┐
│ │ AI (30%) │ tdl cx nvim + claude + terminal
│ nvim (70%) │ e.g. cx │ tsl 4 cx 4 agents, tiled
│ │ │ twdl cx one agent per worktree
├──────────────────────┴─────────────┤ taa cx add an agent, re-tile
│ terminal (15%) │ tscale 6 cx scale to exactly 6
└────────────────────────────────────┘
All layout & pane commands
| Command | What it does |
|---|---|
tdl <ai> |
Dev layout: editor + agent + terminal |
tsl <n> <cmd> |
N tiled panes, same command |
tslm <n1> <c1> [n2] <c2>… |
Mixed tiled panes (e.g. tslm 4 cx 4 cdx) |
twdl <ai> |
One agent per git worktree, editor browses all |
twsl <cmd> |
Full-width vertical stack, one agent per worktree |
tdlm <ai> |
One tdl per subdirectory, for monorepos |
tpl |
Two editors side by side + terminal |
tml <cmds…> |
Main pane left, stacked monitor panes right |
twl <branch> <ai> |
New tab with tdl in a specific worktree |
twlm <ai> |
One tab per worktree, each a full tdl |
taa [cmd] |
Add an agent pane (default: claude), re-tile |
tra [pane] |
Remove an agent pane (fzf picker) |
tscale <n> [cmd] |
Scale to exactly N agent panes |
tap |
List tracked agent panes |
al [cmd] / alw [cmd] |
Launch agent in split pane / new window |
tile [layout] |
Auto-tiling: tiled/main-v/main-h/cols/rows/cycle/off |
A swarm orchestrates many agents across a topology that defines how they communicate. Because a swarm just types the agent command into panes, every topology composes with the others, and with containment.
| Command | Topology | |
|---|---|---|
swarm start <n> <cmd> |
flat | N equal workers, you orchestrate |
swarm mixed <n1> <c1>… |
flat | Different commands per count |
swarm star <n> <cmd> |
star | 1 conductor + N-1 workers |
swarm pipe <n> <cmd> |
pipeline | Sequential stages, each reads predecessor |
swarm gate … |
gated | Pipeline with human approval gates |
swarm pair <cmd> |
pair | Coder + reviewer feedback loop |
swarm wt <n> <cmd> |
worktree | One agent per auto-created worktree |
swarm tournament <n> <cmd> |
tournament | Competitive rounds: evaluate, prune, advance |
swarm mesh <n> <cmd> |
distributed | Across Tailscale devices |
Swarm communication & management
Communication: swarm send <agent> "<msg>", swarm broadcast "<msg>", swarm read <agent> (mailbox), swarm capture <agent> (pane output), swarm inspect <agent>, swarm collect (aggregate outputs). Messages are logged: agentlog <agent> [--tail 20] [--since 10m].
State: swarm status, swarm ls, swarm dashboard (live popup), swarm hub (fzf browser), swarm pick (agent picker w/ preview), swarm checkpoint/swarm restore, swarm merge [--all], swarm conflicts, swarm topology.
Enhanced (swarmx): swarmx plan <topo> <n> <cmd> (dry-run), swarmx status (git diff stats per agent), swarmx merge (interactive fzf merge).
Agents are addressed agent-1 through agent-N.
cxx runs an agent with every permission check off, as you, on your host. That is fine when you're watching one. Walk away from a swarm and each agent can touch ~/.ssh, your other repos, your dotfiles. box flips that: each agent runs in a disposable Linux VM (apple/container) with only its worktree mounted at its real path. Edits and commits land in the real tree; execution (installs, tests, anything) stays inside the VM. Kill the box and nothing leaked. Sandboxing is what makes full autonomy safe.
taa cxc # one contained agent pane
swarm wt 4 cxc # 4 worktrees, 4 VMs, full isolation per agent
swarm mixed 1 cxx 3 cxc # conductor on host, workers contained| Command | |
|---|---|
box / box <cmd> |
Shell / run any command in a box (box npm test) |
box build / box doctor |
Build the image / check runtime, image, login |
cxc |
Claude full-auto, contained |
No host credentials enter a box (no SSH keys, no gh auth): contained agents commit locally with your git identity, the host pushes. Keep conductors on the host; a boxed agent can't run swarm or tmux. Caps: 2 CPUs / 4G (ANU_BOX_CPUS, ANU_BOX_MEMORY). Setup once: brew install container && container system start, then box build, then cxc and /login.
When a swarm finishes, you don't read 200 diffs. review sends them to a model and shows you the summaries it writes, cached per commit SHA and instant on re-review.
| Command | |
|---|---|
review |
Summaries across every divergent branch |
review <agent> [file] |
Deep dive: one agent's work, per-file annotations |
review conflicts / reviewd |
Cross-agent conflicts / persistent auto-refreshing pane |
pd / pds |
Project dashboard popup / sidebar: branches, worktrees, swarms |
map / map open |
Repo dossier → ~/.anu/atlas/<repo> / re-open last |
mesh reaches other machines over Tailscale, so a swarm can spread across devices. nc / ncn connect a host conductor to a chosen execution node and adapt to what it is: a Slurm + Apptainer cluster, a contained box on a Mac or localhost, or a live shell on remote Linux. ncn additionally stands up an endpoint (a kernel, a model server) and forwards it to your localhost.
mesh · nc / ncn reference
Mesh: mesh (device picker), mesh ssh|vnc <device>, mesh run <device> <cmd>, mesh deploy <ai> [device], mesh spawn <n> <dev> [+ …] <cmd>, mesh ping|info|probe|detail, mesh host add/rm, mesh sshconfig, mesh refresh. meshsync on/off/status syncs swarm state across devices.
Node connection: nc <node> connects (pane split + live control channel); ncn <node> is connection, the socket (everything nc does plus an endpoint forwarded to localhost). No-arg → fzf-pick over localhost · the cluster · mesh devices. Helpers: ncn tunnel <node> [lport] [rport], nc|ncn status, nc|ncn down.
| Profile | Node | Channel | Isolation |
|---|---|---|---|
cluster |
HPC login host (Slurm) | ssh login | Apptainer + Slurm |
apple |
Mac on the mesh | ssh (Tailscale) | box |
ssh |
Linux on the mesh | ssh (Tailscale) | optional |
box |
localhost / self | box bash |
box |
The spine is uniform: conductor → channel → containerized execution → endpoint. The conductor stays cxx on the host because it needs your ssh creds, and it reaches the node through the right pane and the tunnel. Per-profile logic ships as the ncn plugin. Defaults read from $ANU_CLUSTER / $ANU_CLUSTER_NAME / $ANU_CLUSTER_IMAGE / $ANU_CLUSTER_PART.
git clone https://github.com/aadarwal/anu.git ~/anu
~/anu/bin/anu initOr via Homebrew:
brew install aadarwal/tap/anu
anu initanu init installs dependencies, then walks you through each config file: merge with your existing setup, replace (with backup), or skip. On a fresh machine, run anu init --replace-all. Then open a new terminal window:
t # start tmux
tdl cx # dev layout: nvim + claude + terminalThe anu CLI · config resolution · what's installed
| Command | |
|---|---|
anu init [--replace-all|--merge-all|--skip-existing] |
Interactive (or non-interactive) setup |
anu status / anu doctor |
Show config link state / check installation health |
anu upgrade |
Pull latest + sync deps + update links |
anu unlink |
Remove all anu configs, restore backups |
anu relink <old> <new> |
Re-attach agent history after moving a project (amv = move + relink) |
When anu init finds an existing config it offers three strategies: merge (shell files: appends a source line, your config stays intact), replace (backs up to *.anu-bak.<timestamp>, then symlinks ours; anu unlink restores it), or skip. Legacy ./install.sh, ./upgrade.sh, ./uninstall.sh delegate to the CLI.
Installed (Homebrew): tmux, bash 5, neovim, starship, eza, fzf, zoxide, bat, ripgrep, fd, mise, gh, jq, tree. Font (separately): brew install --cask font-jetbrains-mono-nerd-font. Config files are symlinked to standard locations (~/.config/tmux/, ~/.config/nvim/, ~/.bashrc, and more); anu status lists them.
| Layer | Tool |
|---|---|
| Terminal | Ghostty |
| Multiplexer | tmux |
| Editor | Neovim + LazyVim |
| Prompt | Starship |
| Shell | Bash 5 + eza + fzf + zoxide + bat + mise |
| AI | Claude Code, opencode, Codex, or anything that runs in a terminal |
- macOS. Uses Ghostty,
pbcopy, andosascript. Bash 5+ (the installer offers to set Homebrew bash as default). - Contained agents (
box,cxc) need apple/container (brew install container). They are optional; everything else works without them. macOS 26+ for full container networking. Ctrl+Option+Shift+Arrows(resize panes) may collide with Mission Control; disable it in System Settings → Keyboard → Keyboard Shortcuts.
Agents & aliases
| Alias | Runs |
|---|---|
cx |
Claude Code (permissions skip) |
cxx |
Claude Code (full auto, on host) |
cxc |
Claude Code (full auto, contained in a VM) |
c |
OpenCode |
cdx / cdxx |
Codex / Codex full-auto |
pi |
Pi coding harness (minimal, no permission popups) |
n · g · d · r |
nvim · git · docker · rails |
ls/lsa · lt/lta |
eza w/ icons · eza tree (a = hidden) |
cd |
zoxide (smart jump) · .. ... .... up N dirs |
Sessions · windows · worktrees
Sessions: t [name] [dir] (attach/create; t . = repo-named), tn <name> (new), tj [query] (project jump via zoxide), tp (picker), tk/tl (kill/list), tss/tsr (save/restore state). Drop a .tmux-workspace file in a project and t sources it on session create.
Windows: tw <name> [cmd], twp (picker), to [cmd] (scratchpad popup), ta [text] (AI popup), tws/twg (stash/grab), tsb/tsa (broadcast text to panes in window/session).
Worktrees: gwa <branch> [base] (create), gwr (remove), gwl (list), gws (cd into), twf [path] (refocus AI panes to a worktree). In nvim, <leader>gw opens a worktree picker that switches cwd and focuses its agent pane.
Git
Quick: gcm <msg> / gcam <msg> (commit / stage-all + commit), gcad (amend), gwip/gunwip (WIP), gp (push + PR URL), gpf (force-with-lease), gsync (rebase on default), gpr [-d] (push + PR), gclean (delete merged).
Interactive (fzf): gb (branches), gl (log), ga (staging), gd (diff), gst (stash).
Config & utilities
Config: cfgmap (browse all configs), cfgedit <name> (quick-edit by short name, e.g. cfgedit tmux).
Utilities: rgi (interactive ripgrep), fdi (interactive fd), fp (process picker), fenv (env browser), mkd <name> (mkdir + cd), json [file] [query], serve [port], extract <file>.
Keybindings
No prefix: Ctrl+Option+Arrows (navigate panes), +Shift (resize), Option+1-9 (window), Option+←/→ (prev/next window), Option+↑/↓ (prev/next session), F12 (nested-tmux pass-through).
Prefix Ctrl+B: h/v split · x kill pane · c/k/r window new/kill/rename · C/K/R session · q reload · s/w/j session/window/project picker · ` scratchpad · a AI popup · A agent launcher · +/- add/remove agent · f review · d dashboard · g/G/b git log/worktree/branch · m mesh · S swarm hub · U grab stashed window.
State & data
anu exposes the repo at ~/.local/share/anu/ and stores runtime state there: swarms/ (metadata, mailboxes), reviews/ (cached summaries per SHA), mesh/ (device cache), box/ (contained-agent state; box/claude holds credentials, gitignored). The atlas at ~/.anu/atlas/<repo> holds dossiers and investigations; the decision trail at ~/.anu/trail/<repo>. The installer link manifest lives at ~/.local/state/anu/manifest so anu unlink restores configs cleanly. All runtime state is gitignored; never commit it.
Previously included (omacmux era)
Commands that no longer exist, mapped to current equivalents: vibe → swarm start <n> cxc · scan*/scanreport → swarm + swarm collect/review · tell → swarm send · who → swarm status · focus → swarm pick · check → swarmx status · recap → review · recipe* → compose a topology directly · acm → removed (commit deliberately) · ship → gp/gpr · voice → removed. NATO agent names (alpha, bravo, and so on) are gone; agents are agent-1 through agent-N.
michelangelo.sh · Quantum Photonics & AI, MIT · MIT License