Skip to content

aadarwal/anu

Repository files navigation

anu

A research agent harness. Many AI agents working the way a research group does: arranged as panes, scaled into isolated swarms, and checked against one another until something holds.

manifesto MIT Quantum Photonics & AI built with Claude platform: macOS license: MIT

Manifesto · Why a harness · The research arc · The machinery · Install · Plugins


anu is the harness behind michelangelo.sh, built at MIT's Quantum Photonics & AI Group (Prof. Dirk Englund). It is an agent-first terminal IDE: tmux is the window manager, Neovim is the editor, and AI agents are first-class panes. They sit beside your code as peers with their own space to think, scaled across your machines and set against one another. Built with Claude, and built for Claude to drive.

What does the future of science research look like?

Software spent the last few years building harnesses for its agents: environments where many of them work in parallel, under isolation, checking each other's work. Science mostly stopped at the chatbot. anu is the harness for what comes next.


Why a harness

The model isn't the binding constraint. The harness is.

A capable model alone is a chatbot. What turns it into a research group is the structure around it: how work fans out, stays isolated, and gets checked. Three principles drive the design.

  • Verification is asymmetric. Confirming a result costs far less than producing it, so the harness is built to generate broadly and verify stringently: many attempts, ruthless checking.
  • Parallelism is the substrate. One agent approximates a chatbot. Many isolated agents that reproduce and refute each other approximate a research group. Agents are peers with their own context, talking through explicit channels (mailboxes, broadcasts, captured panes) rather than ambient soup.
  • Acting and thinking are distinct. Reasoning toward a claim and reproducing the evidence are separate operations. Run both, then admit only what survives the comparison.

That is the loop the harness keeps turning:

  ┌──────────────────────── refine / refute ◀────────────────┐
  ▼                                                           │
 goal ─▶ agents ─┬─▶ reason ───▶ claim ─────┐                 │
         fan out │                          ├─▶ compare ─▶ knowledge
                 └─▶ reproduce ─▶ evidence ─┘

Only what survives the comparison becomes knowledge; everything else loops back to be refined or refuted. Research holds many control loops at once, so method is a position on a slider, from deterministic optimization at one end to genuinely open-ended reasoning at the other. The harness spans the whole range.

The goal is narrow and concrete: shorten the path from a good idea to a result that holds, scaling a group's judgment across far more attempts than hands alone allow. Scientists still do the science.


The research arc

The harness ships a marketplace of plugins (Claude Code / Pi skills, developed in place). The flagship ones form one pipeline. Each stage is a workflow → fixed-template artefact in the atlas, and a decision trail records the reasoning the whole way:

   find ───────────▶ understand ───────▶ do ──────────▶ show
   /prior-art        /study             /investigate    /present
   /lineage          (Claim·Method·     (hypotheses,    (served, visual:
   /similar-code      Result·Gap)        run in a swarm)  Manim · marimo)
                          │                   ▲
                          └── a Gap becomes ──┘
                              a hypothesis
        trail ── records goal · hypothesis · alternatives · outcome (git trailers)
Stage Command What it does
find /prior-art /lineage /similar-code Map the idea-space: prior work with an honest novelty read, what a paper builds on and what cites it, codebases already doing it. Retrieval is grounded (OpenAlex + GitHub), so every result is a real, fetchable source, never recalled.
understand /study Read a new paper into a dossier: Claim (what it asserts), Method (how it was shown), Result (what was reported), Gap (what's open). The Gap feeds /investigate.
do /investigate Run the question, don't read about it. Frame it into 2-5 falsifiable hypotheses, fan out one contained agent per hypothesis (the swarm), judge the outcomes adversarially against their evidence, record everything to the trail.
show /present Turn a result into a served, visual presentation: pick the medium per finding (Manim animation · marimo app · static figure · served notebook), render on the right compute, reachable over Tailscale.
trail trail The decision graph: hypotheses, choices, roads not taken, and outcomes recorded as git-commit trailers. Reconstructs from git log alone (no LLM), and renders as a B&W graph + tempo timeline with open hypotheses flagged as awaiting a verdict.
/study  arxiv.org/abs/2601.06712        # understand: a paper → dossier in the atlas
/investigate "does the apodized taper beat the baseline on insertion loss?"
                                         # do: 3 hypotheses, one contained agent each, judged
trail                                    # see the decision graph reconstruct from git
/present                                 # show: render the surviving result, serve it

/delve conducts understand→show in one shot; /map and /explore render a whole repo into a one-page dossier. The full set lives in plugins/: science-writing (verify-citations, peer-review, rebuttal, arxiv-prep), tikz, manim, marimo, and writing-styles.


The machinery

Underneath the research arc is a general agent-first IDE. In anu, the unit of work is the agent. Everything sits in one of four layers, each composing on the one below:

  SEE      review · pd · map        understand what exists and what changed
  SCALE    swarm · box · mesh       many agents, isolated, across devices
  ARRANGE  tdl · tsl · taa · al     agents as panes in space
  GROUND   t · tw · gwa             sessions, windows, worktrees

You ground a workspace, arrange agents inside it, scale them out when one isn't enough, and see what they did:

t myproject .        # ground:  session rooted at the repo
swarm wt 4 cxc       # scale:   4 contained agents, one per worktree
swarm collect        # see:     gather their outputs
review               # see:     AI summaries per branch
swarmx merge         #          merge the good ones

Arrange: agents as panes

Layout commands run inside tmux. The daily driver is tdl (dev layout): editor left, agent right, terminal bottom.

┌──────────────────────┬─────────────┐
│                      │   AI (30%)  │   tdl cx       nvim + claude + terminal
│     nvim (70%)       │   e.g. cx   │   tsl 4 cx     4 agents, tiled
│                      │             │   twdl cx      one agent per worktree
├──────────────────────┴─────────────┤   taa cx       add an agent, re-tile
│           terminal (15%)           │   tscale 6 cx  scale to exactly 6
└────────────────────────────────────┘
All layout & pane commands
Command What it does
tdl <ai> Dev layout: editor + agent + terminal
tsl <n> <cmd> N tiled panes, same command
tslm <n1> <c1> [n2] <c2>… Mixed tiled panes (e.g. tslm 4 cx 4 cdx)
twdl <ai> One agent per git worktree, editor browses all
twsl <cmd> Full-width vertical stack, one agent per worktree
tdlm <ai> One tdl per subdirectory, for monorepos
tpl Two editors side by side + terminal
tml <cmds…> Main pane left, stacked monitor panes right
twl <branch> <ai> New tab with tdl in a specific worktree
twlm <ai> One tab per worktree, each a full tdl
taa [cmd] Add an agent pane (default: claude), re-tile
tra [pane] Remove an agent pane (fzf picker)
tscale <n> [cmd] Scale to exactly N agent panes
tap List tracked agent panes
al [cmd] / alw [cmd] Launch agent in split pane / new window
tile [layout] Auto-tiling: tiled/main-v/main-h/cols/rows/cycle/off

Scale: swarms

A swarm orchestrates many agents across a topology that defines how they communicate. Because a swarm just types the agent command into panes, every topology composes with the others, and with containment.

Command Topology
swarm start <n> <cmd> flat N equal workers, you orchestrate
swarm mixed <n1> <c1>… flat Different commands per count
swarm star <n> <cmd> star 1 conductor + N-1 workers
swarm pipe <n> <cmd> pipeline Sequential stages, each reads predecessor
swarm gate … gated Pipeline with human approval gates
swarm pair <cmd> pair Coder + reviewer feedback loop
swarm wt <n> <cmd> worktree One agent per auto-created worktree
swarm tournament <n> <cmd> tournament Competitive rounds: evaluate, prune, advance
swarm mesh <n> <cmd> distributed Across Tailscale devices
Swarm communication & management

Communication: swarm send <agent> "<msg>", swarm broadcast "<msg>", swarm read <agent> (mailbox), swarm capture <agent> (pane output), swarm inspect <agent>, swarm collect (aggregate outputs). Messages are logged: agentlog <agent> [--tail 20] [--since 10m].

State: swarm status, swarm ls, swarm dashboard (live popup), swarm hub (fzf browser), swarm pick (agent picker w/ preview), swarm checkpoint/swarm restore, swarm merge [--all], swarm conflicts, swarm topology.

Enhanced (swarmx): swarmx plan <topo> <n> <cmd> (dry-run), swarmx status (git diff stats per agent), swarmx merge (interactive fzf merge).

Agents are addressed agent-1 through agent-N.

Containment: autonomy made safe

cxx runs an agent with every permission check off, as you, on your host. That is fine when you're watching one. Walk away from a swarm and each agent can touch ~/.ssh, your other repos, your dotfiles. box flips that: each agent runs in a disposable Linux VM (apple/container) with only its worktree mounted at its real path. Edits and commits land in the real tree; execution (installs, tests, anything) stays inside the VM. Kill the box and nothing leaked. Sandboxing is what makes full autonomy safe.

taa cxc                    # one contained agent pane
swarm wt 4 cxc             # 4 worktrees, 4 VMs, full isolation per agent
swarm mixed 1 cxx 3 cxc    # conductor on host, workers contained
Command
box / box <cmd> Shell / run any command in a box (box npm test)
box build / box doctor Build the image / check runtime, image, login
cxc Claude full-auto, contained

No host credentials enter a box (no SSH keys, no gh auth): contained agents commit locally with your git identity, the host pushes. Keep conductors on the host; a boxed agent can't run swarm or tmux. Caps: 2 CPUs / 4G (ANU_BOX_CPUS, ANU_BOX_MEMORY). Setup once: brew install container && container system start, then box build, then cxc and /login.

See: what the agents did

When a swarm finishes, you don't read 200 diffs. review sends them to a model and shows you the summaries it writes, cached per commit SHA and instant on re-review.

Command
review Summaries across every divergent branch
review <agent> [file] Deep dive: one agent's work, per-file annotations
review conflicts / reviewd Cross-agent conflicts / persistent auto-refreshing pane
pd / pds Project dashboard popup / sidebar: branches, worktrees, swarms
map / map open Repo dossier → ~/.anu/atlas/<repo> / re-open last

Mesh & node connection

mesh reaches other machines over Tailscale, so a swarm can spread across devices. nc / ncn connect a host conductor to a chosen execution node and adapt to what it is: a Slurm + Apptainer cluster, a contained box on a Mac or localhost, or a live shell on remote Linux. ncn additionally stands up an endpoint (a kernel, a model server) and forwards it to your localhost.

mesh · nc / ncn reference

Mesh: mesh (device picker), mesh ssh|vnc <device>, mesh run <device> <cmd>, mesh deploy <ai> [device], mesh spawn <n> <dev> [+ …] <cmd>, mesh ping|info|probe|detail, mesh host add/rm, mesh sshconfig, mesh refresh. meshsync on/off/status syncs swarm state across devices.

Node connection: nc <node> connects (pane split + live control channel); ncn <node> is connection, the socket (everything nc does plus an endpoint forwarded to localhost). No-arg → fzf-pick over localhost · the cluster · mesh devices. Helpers: ncn tunnel <node> [lport] [rport], nc|ncn status, nc|ncn down.

Profile Node Channel Isolation
cluster HPC login host (Slurm) ssh login Apptainer + Slurm
apple Mac on the mesh ssh (Tailscale) box
ssh Linux on the mesh ssh (Tailscale) optional
box localhost / self box bash box

The spine is uniform: conductor → channel → containerized execution → endpoint. The conductor stays cxx on the host because it needs your ssh creds, and it reaches the node through the right pane and the tunnel. Per-profile logic ships as the ncn plugin. Defaults read from $ANU_CLUSTER / $ANU_CLUSTER_NAME / $ANU_CLUSTER_IMAGE / $ANU_CLUSTER_PART.


Install

git clone https://github.com/aadarwal/anu.git ~/anu
~/anu/bin/anu init

Or via Homebrew:

brew install aadarwal/tap/anu
anu init

anu init installs dependencies, then walks you through each config file: merge with your existing setup, replace (with backup), or skip. On a fresh machine, run anu init --replace-all. Then open a new terminal window:

t              # start tmux
tdl cx         # dev layout: nvim + claude + terminal
The anu CLI · config resolution · what's installed
Command
anu init [--replace-all|--merge-all|--skip-existing] Interactive (or non-interactive) setup
anu status / anu doctor Show config link state / check installation health
anu upgrade Pull latest + sync deps + update links
anu unlink Remove all anu configs, restore backups
anu relink <old> <new> Re-attach agent history after moving a project (amv = move + relink)

When anu init finds an existing config it offers three strategies: merge (shell files: appends a source line, your config stays intact), replace (backs up to *.anu-bak.<timestamp>, then symlinks ours; anu unlink restores it), or skip. Legacy ./install.sh, ./upgrade.sh, ./uninstall.sh delegate to the CLI.

Installed (Homebrew): tmux, bash 5, neovim, starship, eza, fzf, zoxide, bat, ripgrep, fd, mise, gh, jq, tree. Font (separately): brew install --cask font-jetbrains-mono-nerd-font. Config files are symlinked to standard locations (~/.config/tmux/, ~/.config/nvim/, ~/.bashrc, and more); anu status lists them.

The stack

Layer Tool
Terminal Ghostty
Multiplexer tmux
Editor Neovim + LazyVim
Prompt Starship
Shell Bash 5 + eza + fzf + zoxide + bat + mise
AI Claude Code, opencode, Codex, or anything that runs in a terminal

Requirements

  • macOS. Uses Ghostty, pbcopy, and osascript. Bash 5+ (the installer offers to set Homebrew bash as default).
  • Contained agents (box, cxc) need apple/container (brew install container). They are optional; everything else works without them. macOS 26+ for full container networking.
  • Ctrl+Option+Shift+Arrows (resize panes) may collide with Mission Control; disable it in System Settings → Keyboard → Keyboard Shortcuts.

Reference

Agents & aliases
Alias Runs
cx Claude Code (permissions skip)
cxx Claude Code (full auto, on host)
cxc Claude Code (full auto, contained in a VM)
c OpenCode
cdx / cdxx Codex / Codex full-auto
pi Pi coding harness (minimal, no permission popups)
n · g · d · r nvim · git · docker · rails
ls/lsa · lt/lta eza w/ icons · eza tree (a = hidden)
cd zoxide (smart jump) · .. ... .... up N dirs
Sessions · windows · worktrees

Sessions: t [name] [dir] (attach/create; t . = repo-named), tn <name> (new), tj [query] (project jump via zoxide), tp (picker), tk/tl (kill/list), tss/tsr (save/restore state). Drop a .tmux-workspace file in a project and t sources it on session create.

Windows: tw <name> [cmd], twp (picker), to [cmd] (scratchpad popup), ta [text] (AI popup), tws/twg (stash/grab), tsb/tsa (broadcast text to panes in window/session).

Worktrees: gwa <branch> [base] (create), gwr (remove), gwl (list), gws (cd into), twf [path] (refocus AI panes to a worktree). In nvim, <leader>gw opens a worktree picker that switches cwd and focuses its agent pane.

Git

Quick: gcm <msg> / gcam <msg> (commit / stage-all + commit), gcad (amend), gwip/gunwip (WIP), gp (push + PR URL), gpf (force-with-lease), gsync (rebase on default), gpr [-d] (push + PR), gclean (delete merged).

Interactive (fzf): gb (branches), gl (log), ga (staging), gd (diff), gst (stash).

Config & utilities

Config: cfgmap (browse all configs), cfgedit <name> (quick-edit by short name, e.g. cfgedit tmux).

Utilities: rgi (interactive ripgrep), fdi (interactive fd), fp (process picker), fenv (env browser), mkd <name> (mkdir + cd), json [file] [query], serve [port], extract <file>.

Keybindings

No prefix: Ctrl+Option+Arrows (navigate panes), +Shift (resize), Option+1-9 (window), Option+←/→ (prev/next window), Option+↑/↓ (prev/next session), F12 (nested-tmux pass-through).

Prefix Ctrl+B: h/v split · x kill pane · c/k/r window new/kill/rename · C/K/R session · q reload · s/w/j session/window/project picker · ` scratchpad · a AI popup · A agent launcher · +/- add/remove agent · f review · d dashboard · g/G/b git log/worktree/branch · m mesh · S swarm hub · U grab stashed window.

State & data

anu exposes the repo at ~/.local/share/anu/ and stores runtime state there: swarms/ (metadata, mailboxes), reviews/ (cached summaries per SHA), mesh/ (device cache), box/ (contained-agent state; box/claude holds credentials, gitignored). The atlas at ~/.anu/atlas/<repo> holds dossiers and investigations; the decision trail at ~/.anu/trail/<repo>. The installer link manifest lives at ~/.local/state/anu/manifest so anu unlink restores configs cleanly. All runtime state is gitignored; never commit it.

Previously included (omacmux era)

Commands that no longer exist, mapped to current equivalents: vibeswarm start <n> cxc · scan*/scanreport → swarm + swarm collect/review · tellswarm send · whoswarm status · focusswarm pick · checkswarmx status · recapreview · recipe* → compose a topology directly · acm → removed (commit deliberately) · shipgp/gpr · voice → removed. NATO agent names (alpha, bravo, and so on) are gone; agents are agent-1 through agent-N.


michelangelo.sh · Quantum Photonics & AI, MIT · MIT License