anu

A research agent harness. Many AI agents working the way a research group does: arranged as panes, scaled into isolated swarms, and checked against one another until something holds.

Manifesto · Why a harness · The research arc · The machinery · Install · Plugins

anu is the harness behind michelangelo.sh, built at MIT's Quantum Photonics & AI Group (Prof. Dirk Englund). It is an agent-first terminal IDE: tmux is the window manager, Neovim is the editor, and AI agents are first-class panes. They sit beside your code as peers with their own space to think, scaled across your machines and set against one another. Built with Claude, and built for Claude to drive.

What does the future of science research look like?

Software spent the last few years building harnesses for its agents: environments where many of them work in parallel, under isolation, checking each other's work. Science mostly stopped at the chatbot. anu is the harness for what comes next.

Why a harness

The model isn't the binding constraint. The harness is.

A capable model alone is a chatbot. What turns it into a research group is the structure around it: how work fans out, stays isolated, and gets checked. Three principles drive the design.

Verification is asymmetric. Confirming a result costs far less than producing it, so the harness is built to generate broadly and verify stringently: many attempts, ruthless checking.
Parallelism is the substrate. One agent approximates a chatbot. Many isolated agents that reproduce and refute each other approximate a research group. Agents are peers with their own context, talking through explicit channels (mailboxes, broadcasts, captured panes) rather than ambient soup.
Acting and thinking are distinct. Reasoning toward a claim and reproducing the evidence are separate operations. Run both, then admit only what survives the comparison.

That is the loop the harness keeps turning:

  ┌──────────────────────── refine / refute ◀────────────────┐
  ▼                                                           │
 goal ─▶ agents ─┬─▶ reason ───▶ claim ─────┐                 │
         fan out │                          ├─▶ compare ─▶ knowledge
                 └─▶ reproduce ─▶ evidence ─┘

Only what survives the comparison becomes knowledge; everything else loops back to be refined or refuted. Research holds many control loops at once, so method is a position on a slider, from deterministic optimization at one end to genuinely open-ended reasoning at the other. The harness spans the whole range.

The goal is narrow and concrete: shorten the path from a good idea to a result that holds, scaling a group's judgment across far more attempts than hands alone allow. Scientists still do the science.

The research arc

The harness ships a marketplace of plugins (Claude Code / Pi skills, developed in place). The flagship ones form one pipeline. Each stage is a workflow → fixed-template artefact in the atlas, and a decision trail records the reasoning the whole way:

   find ───────────▶ understand ───────▶ do ──────────▶ show
   /prior-art        /study             /investigate    /present
   /lineage          (Claim·Method·     (hypotheses,    (served, visual:
   /similar-code      Result·Gap)        run in a swarm)  Manim · marimo)
                          │                   ▲
                          └── a Gap becomes ──┘
                              a hypothesis
        trail ── records goal · hypothesis · alternatives · outcome (git trailers)

Stage	Command	What it does
find	`/prior-art` `/lineage` `/similar-code`	Map the idea-space: prior work with an honest novelty read, what a paper builds on and what cites it, codebases already doing it. Retrieval is grounded (OpenAlex + GitHub), so every result is a real, fetchable source, never recalled.
understand	`/study`	Read a new paper into a dossier: Claim (what it asserts), Method (how it was shown), Result (what was reported), Gap (what's open). The Gap feeds `/investigate`.
do	`/investigate`	Run the question, don't read about it. Frame it into 2-5 falsifiable hypotheses, fan out one contained agent per hypothesis (the swarm), judge the outcomes adversarially against their evidence, record everything to the trail.
show	`/present`	Turn a result into a served, visual presentation: pick the medium per finding (Manim animation · marimo app · static figure · served notebook), render on the right compute, reachable over Tailscale.
trail	`trail`	The decision graph: hypotheses, choices, roads not taken, and outcomes recorded as git-commit trailers. Reconstructs from `git log` alone (no LLM), and renders as a B&W graph + tempo timeline with open hypotheses flagged as awaiting a verdict.

/study  arxiv.org/abs/2601.06712        # understand: a paper → dossier in the atlas
/investigate "does the apodized taper beat the baseline on insertion loss?"
                                         # do: 3 hypotheses, one contained agent each, judged
trail                                    # see the decision graph reconstruct from git
/present                                 # show: render the surviving result, serve it

/delve conducts understand→show in one shot; /map and /explore render a whole repo into a one-page dossier. The full set lives in plugins/: science-writing (verify-citations, peer-review, rebuttal, arxiv-prep), tikz, manim, marimo, and writing-styles.

The machinery

Underneath the research arc is a general agent-first IDE. In anu, the unit of work is the agent. Everything sits in one of four layers, each composing on the one below:

  SEE      review · pd · map        understand what exists and what changed
  SCALE    swarm · box · mesh       many agents, isolated, across devices
  ARRANGE  tdl · tsl · taa · al     agents as panes in space
  GROUND   t · tw · gwa             sessions, windows, worktrees

You ground a workspace, arrange agents inside it, scale them out when one isn't enough, and see what they did:

t myproject .        # ground:  session rooted at the repo
swarm wt 4 cxc       # scale:   4 contained agents, one per worktree
swarm collect        # see:     gather their outputs
review               # see:     AI summaries per branch
swarmx merge         #          merge the good ones

Arrange: agents as panes

Layout commands run inside tmux. The daily driver is tdl (dev layout): editor left, agent right, terminal bottom.

┌──────────────────────┬─────────────┐
│                      │   AI (30%)  │   tdl cx       nvim + claude + terminal
│     nvim (70%)       │   e.g. cx   │   tsl 4 cx     4 agents, tiled
│                      │             │   twdl cx      one agent per worktree
├──────────────────────┴─────────────┤   taa cx       add an agent, re-tile
│           terminal (15%)           │   tscale 6 cx  scale to exactly 6
└────────────────────────────────────┘

All layout & pane commands

Command	What it does
`tdl <ai>`	Dev layout: editor + agent + terminal
`tsl <n> <cmd>`	N tiled panes, same command
`tslm <n1> <c1> [n2] <c2>…`	Mixed tiled panes (e.g. `tslm 4 cx 4 cdx`)
`twdl <ai>`	One agent per git worktree, editor browses all
`twsl <cmd>`	Full-width vertical stack, one agent per worktree
`tdlm <ai>`	One `tdl` per subdirectory, for monorepos
`tpl`	Two editors side by side + terminal
`tml <cmds…>`	Main pane left, stacked monitor panes right
`twl <branch> <ai>`	New tab with `tdl` in a specific worktree
`twlm <ai>`	One tab per worktree, each a full `tdl`
`taa [cmd]`	Add an agent pane (default: claude), re-tile
`tra [pane]`	Remove an agent pane (fzf picker)
`tscale <n> [cmd]`	Scale to exactly N agent panes
`tap`	List tracked agent panes
`al [cmd]` / `alw [cmd]`	Launch agent in split pane / new window
`tile [layout]`	Auto-tiling: tiled/main-v/main-h/cols/rows/cycle/off

Scale: swarms

A swarm orchestrates many agents across a topology that defines how they communicate. Because a swarm just types the agent command into panes, every topology composes with the others, and with containment.

Command	Topology
`swarm start <n> <cmd>`	flat	N equal workers, you orchestrate
`swarm mixed <n1> <c1>…`	flat	Different commands per count
`swarm star <n> <cmd>`	star	1 conductor + N-1 workers
`swarm pipe <n> <cmd>`	pipeline	Sequential stages, each reads predecessor
`swarm gate …`	gated	Pipeline with human approval gates
`swarm pair <cmd>`	pair	Coder + reviewer feedback loop
`swarm wt <n> <cmd>`	worktree	One agent per auto-created worktree
`swarm tournament <n> <cmd>`	tournament	Competitive rounds: evaluate, prune, advance
`swarm mesh <n> <cmd>`	distributed	Across Tailscale devices

Swarm communication & management

Communication: swarm send <agent> "<msg>", swarm broadcast "<msg>", swarm read <agent> (mailbox), swarm capture <agent> (pane output), swarm inspect <agent>, swarm collect (aggregate outputs). Messages are logged: agentlog <agent> [--tail 20] [--since 10m].

State: swarm status, swarm ls, swarm dashboard (live popup), swarm hub (fzf browser), swarm pick (agent picker w/ preview), swarm checkpoint/swarm restore, swarm merge [--all], swarm conflicts, swarm topology.

Enhanced (swarmx): swarmx plan <topo> <n> <cmd> (dry-run), swarmx status (git diff stats per agent), swarmx merge (interactive fzf merge).

Agents are addressed agent-1 through agent-N.

Containment: autonomy made safe

cxx runs an agent with every permission check off, as you, on your host. That is fine when you're watching one. Walk away from a swarm and each agent can touch ~/.ssh, your other repos, your dotfiles. box flips that: each agent runs in a disposable Linux VM (apple/container) with only its worktree mounted at its real path. Edits and commits land in the real tree; execution (installs, tests, anything) stays inside the VM. Kill the box and nothing leaked. Sandboxing is what makes full autonomy safe.

taa cxc                    # one contained agent pane
swarm wt 4 cxc             # 4 worktrees, 4 VMs, full isolation per agent
swarm mixed 1 cxx 3 cxc    # conductor on host, workers contained

Command
`box` / `box <cmd>`	Shell / run any command in a box (`box npm test`)
`box build` / `box doctor`	Build the image / check runtime, image, login
`cxc`	Claude full-auto, contained

No host credentials enter a box (no SSH keys, no gh auth): contained agents commit locally with your git identity, the host pushes. Keep conductors on the host; a boxed agent can't run swarm or tmux. Caps: 2 CPUs / 4G (ANU_BOX_CPUS, ANU_BOX_MEMORY). Setup once: brew install container && container system start, then box build, then cxc and /login.

See: what the agents did

When a swarm finishes, you don't read 200 diffs. review sends them to a model and shows you the summaries it writes, cached per commit SHA and instant on re-review.

Command
`review`	Summaries across every divergent branch
`review <agent> [file]`	Deep dive: one agent's work, per-file annotations
`review conflicts` / `reviewd`	Cross-agent conflicts / persistent auto-refreshing pane
`pd` / `pds`	Project dashboard popup / sidebar: branches, worktrees, swarms
`map` / `map open`	Repo dossier → `~/.anu/atlas/<repo>` / re-open last

Mesh & node connection

mesh reaches other machines over Tailscale, so a swarm can spread across devices. nc / ncn connect a host conductor to a chosen execution node and adapt to what it is: a Slurm + Apptainer cluster, a contained box on a Mac or localhost, or a live shell on remote Linux. ncn additionally stands up an endpoint (a kernel, a model server) and forwards it to your localhost.

mesh · nc / ncn reference

Mesh: mesh (device picker), mesh ssh|vnc <device>, mesh run <device> <cmd>, mesh deploy <ai> [device], mesh spawn <n> <dev> [+ …] <cmd>, mesh ping|info|probe|detail, mesh host add/rm, mesh sshconfig, mesh refresh. meshsync on/off/status syncs swarm state across devices.

Node connection: nc <node> connects (pane split + live control channel); ncn <node> is connection, the socket (everything nc does plus an endpoint forwarded to localhost). No-arg → fzf-pick over localhost · the cluster · mesh devices. Helpers: ncn tunnel <node> [lport] [rport], nc|ncn status, nc|ncn down.

Profile	Node	Channel	Isolation
`cluster`	HPC login host (Slurm)	ssh login	Apptainer + Slurm
`apple`	Mac on the mesh	ssh (Tailscale)	`box`
`ssh`	Linux on the mesh	ssh (Tailscale)	optional
`box`	localhost / self	`box bash`	`box`

The spine is uniform: conductor → channel → containerized execution → endpoint. The conductor stays cxx on the host because it needs your ssh creds, and it reaches the node through the right pane and the tunnel. Per-profile logic ships as the ncn plugin. Defaults read from $ANU_CLUSTER / $ANU_CLUSTER_NAME / $ANU_CLUSTER_IMAGE / $ANU_CLUSTER_PART.

Install

git clone https://github.com/aadarwal/anu.git ~/anu
~/anu/bin/anu init

Or via Homebrew:

brew install aadarwal/tap/anu
anu init

anu init installs dependencies, then walks you through each config file: merge with your existing setup, replace (with backup), or skip. On a fresh machine, run anu init --replace-all. Then open a new terminal window:

t              # start tmux
tdl cx         # dev layout: nvim + claude + terminal

The anu CLI · config resolution · what's installed

Command
`anu init [--replace-all\|--merge-all\|--skip-existing]`	Interactive (or non-interactive) setup
`anu status` / `anu doctor`	Show config link state / check installation health
`anu upgrade`	Pull latest + sync deps + update links
`anu unlink`	Remove all anu configs, restore backups
`anu relink <old> <new>`	Re-attach agent history after moving a project (`amv` = move + relink)

When anu init finds an existing config it offers three strategies: merge (shell files: appends a source line, your config stays intact), replace (backs up to *.anu-bak.<timestamp>, then symlinks ours; anu unlink restores it), or skip. Legacy ./install.sh, ./upgrade.sh, ./uninstall.sh delegate to the CLI.

Installed (Homebrew): tmux, bash 5, neovim, starship, eza, fzf, zoxide, bat, ripgrep, fd, mise, gh, jq, tree. Font (separately): brew install --cask font-jetbrains-mono-nerd-font. Config files are symlinked to standard locations (~/.config/tmux/, ~/.config/nvim/, ~/.bashrc, and more); anu status lists them.

The stack

Layer	Tool
Terminal	Ghostty
Multiplexer	tmux
Editor	Neovim + LazyVim
Prompt	Starship
Shell	Bash 5 + eza + fzf + zoxide + bat + mise
AI	Claude Code, opencode, Codex, or anything that runs in a terminal

Requirements

macOS. Uses Ghostty, pbcopy, and osascript. Bash 5+ (the installer offers to set Homebrew bash as default).
Contained agents (box, cxc) need apple/container (brew install container). They are optional; everything else works without them. macOS 26+ for full container networking.
Ctrl+Option+Shift+Arrows (resize panes) may collide with Mission Control; disable it in System Settings → Keyboard → Keyboard Shortcuts.

Reference

Agents & aliases

Alias	Runs
`cx`	Claude Code (permissions skip)
`cxx`	Claude Code (full auto, on host)
`cxc`	Claude Code (full auto, contained in a VM)
`c`	OpenCode
`cdx` / `cdxx`	Codex / Codex full-auto
`pi`	Pi coding harness (minimal, no permission popups)
`n` · `g` · `d` · `r`	nvim · git · docker · rails
`ls`/`lsa` · `lt`/`lta`	eza w/ icons · eza tree (`a` = hidden)
`cd`	zoxide (smart jump) · `..` `...` `....` up N dirs

Sessions · windows · worktrees

Sessions: t [name] [dir] (attach/create; t . = repo-named), tn <name> (new), tj [query] (project jump via zoxide), tp (picker), tk/tl (kill/list), tss/tsr (save/restore state). Drop a .tmux-workspace file in a project and t sources it on session create.

Windows: tw <name> [cmd], twp (picker), to [cmd] (scratchpad popup), ta [text] (AI popup), tws/twg (stash/grab), tsb/tsa (broadcast text to panes in window/session).

Worktrees: gwa <branch> [base] (create), gwr (remove), gwl (list), gws (cd into), twf [path] (refocus AI panes to a worktree). In nvim, <leader>gw opens a worktree picker that switches cwd and focuses its agent pane.

Git

Quick: gcm <msg> / gcam <msg> (commit / stage-all + commit), gcad (amend), gwip/gunwip (WIP), gp (push + PR URL), gpf (force-with-lease), gsync (rebase on default), gpr [-d] (push + PR), gclean (delete merged).

Interactive (fzf): gb (branches), gl (log), ga (staging), gd (diff), gst (stash).

Config & utilities

Config: cfgmap (browse all configs), cfgedit <name> (quick-edit by short name, e.g. cfgedit tmux).

Utilities: rgi (interactive ripgrep), fdi (interactive fd), fp (process picker), fenv (env browser), mkd <name> (mkdir + cd), json [file] [query], serve [port], extract <file>.

Keybindings

No prefix: Ctrl+Option+Arrows (navigate panes), +Shift (resize), Option+1-9 (window), Option+←/→ (prev/next window), Option+↑/↓ (prev/next session), F12 (nested-tmux pass-through).

Prefix Ctrl+B: h/v split · x kill pane · c/k/r window new/kill/rename · C/K/R session · q reload · s/w/j session/window/project picker · ` scratchpad · a AI popup · A agent launcher · +/- add/remove agent · f review · d dashboard · g/G/b git log/worktree/branch · m mesh · S swarm hub · U grab stashed window.

State & data

anu exposes the repo at ~/.local/share/anu/ and stores runtime state there: swarms/ (metadata, mailboxes), reviews/ (cached summaries per SHA), mesh/ (device cache), box/ (contained-agent state; box/claude holds credentials, gitignored). The atlas at ~/.anu/atlas/<repo> holds dossiers and investigations; the decision trail at ~/.anu/trail/<repo>. The installer link manifest lives at ~/.local/state/anu/manifest so anu unlink restores configs cleanly. All runtime state is gitignored; never commit it.

Previously included (omacmux era)

Commands that no longer exist, mapped to current equivalents: vibe → swarm start <n> cxc · scan*/scanreport → swarm + swarm collect/review · tell → swarm send · who → swarm status · focus → swarm pick · check → swarmx status · recap → review · recipe* → compose a topology directly · acm → removed (commit deliberately) · ship → gp/gpr · voice → removed. NATO agent names (alpha, bravo, and so on) are gone; agents are agent-1 through agent-N.

michelangelo.sh · Quantum Photonics & AI, MIT · MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
agents		agents
bin		bin
config		config
mesh		mesh
plugins		plugins
session_logs		session_logs
shell		shell
tests		tests
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
Brewfile		Brewfile
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
install.sh		install.sh
links.sh		links.sh
running.md		running.md
uninstall.sh		uninstall.sh
upgrade.sh		upgrade.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

anu

Why a harness

The research arc

The machinery

Arrange: agents as panes

Scale: swarms

Containment: autonomy made safe

See: what the agents did

Mesh & node connection

Install

The stack

Requirements

Reference

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

anu

Why a harness

The research arc

The machinery

Arrange: agents as panes

Scale: swarms

Containment: autonomy made safe

See: what the agents did

Mesh & node connection

Install

The stack

Requirements

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages