STORM: Multi-agent Collaboration with State Management

Multi-agent orchestration framework for code implementation (Commit0) and paper reproduction (PaperBench) benchmarks. Built on OpenHands SDK.

Setup

Prerequisites

Python >= 3.12
uv (Python package manager)
Docker

Quick Start

# Clone the repository (with submodules)
git clone --recursive https://github.com/dreamyang-liu/STORM.git
cd STORM/STORM

# Run the setup script (installs deps + builds Docker images)
bash setup.sh

# Set your API key
source .env   # edit .env first to fill in LLM_API_KEY and OPENROUTER_API_KEY

Manual Installation

cd STORM/STORM

# Install Python dependencies
uv sync

# Build Docker image
cd ../software-agent-sdk
docker build \
  -f openhands-agent-server/openhands/agent_server/docker/Dockerfile \
  --target source-minimal-storm \
  --platform linux/amd64 \
  -t agent-server:storm-base \
  .
cd ../STORM

Environment Variables

# Agent API (DashScope or OpenRouter)
export LLM_API_KEY=<your-api-key>
export LLM_BASE_URL=https://openrouter.ai/api/v1   # or https://dashscope.aliyuncs.com/compatible-mode/v1

# Judge API (OpenRouter, for PaperBench evaluation)
export OPENROUTER_API_KEY=<your-openrouter-key>

# SDK path
export SDK_SOURCE_DIR=<path-to>/software-agent-sdk

Prepare Data

Commit0

Download the commit0_combined dataset:

# Place at STORM/data/commit0/commit0_combined_disk/

PaperBench

Place the PaperBench data from frontier-evals at:

STORM/data/paperbench/papers/
├── rice/
│   ├── config.yaml
│   ├── paper.pdf
│   ├── paper.md
│   ├── rubric.json
│   ├── addendum.md
│   └── blacklist.txt
└── ...

PaperBench judge requires additional packages:

uv pip install -e ../frontier-evals/project/paperbench
uv pip install -e ../frontier-evals/project/common/preparedness_turn_completer

Running Experiments

Single-Agent Baseline

bash scripts/run_single.sh

Multi-Agent (STORM)

bash scripts/run_multi.sh

Batch Run (all papers/repos in parallel)

bash scripts/run_batch.sh

Edit the parameters at the top of each script (model, task, paper_id/repo, etc.) before running.

Key Parameters

Parameter	Description
`task`	`"commit0"` or `"paperbench"`
`model`	LiteLLM model identifier (e.g., `openai/deepseek-v4-pro`)
`max_subagents`	Number of parallel engineer subagents
`max_iterations`	Maximum LLM iterations for the manager
`sub_iterations`	Maximum LLM iterations per subagent
`rounds_of_chat`	Maximum rounds of task assignment per engineer

Output

Results are saved to outputs/<task>/<model>/<identifier>/<mode>/<params>/:

cost.json — token usage and cost breakdown
runtime.txt — wall-clock runtime in seconds
outputs.jsonl — structured event log
grade.json — (PaperBench) judge evaluation results
report.json — (Commit0) pytest results

Re-judge

bash scripts/rejudge.sh <output_dir> [paper1 paper2 ...]

Acknowledgements

We thank the following open-source projects that STORM builds upon:

OpenHands for the agent SDK framework
Commit0 for the code implementation benchmark
PaperBench for the paper reproduction benchmark

Citation

@misc{liu2026multiagentcollaborationstatemanagement,
      title={Multi-agent Collaboration with State Management},
      author={Mengyang Liu and Taozhi Chen and Zhenhua Xu and Xue Jiang and Yihong Dong},
      year={2026},
      eprint={2605.20563},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2605.20563},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
STORM		STORM
frontier-evals		frontier-evals
software-agent-sdk		software-agent-sdk
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STORM: Multi-agent Collaboration with State Management

Setup

Prerequisites

Quick Start

Manual Installation

Environment Variables

Prepare Data

Commit0

PaperBench

Running Experiments

Single-Agent Baseline

Multi-Agent (STORM)

Batch Run (all papers/repos in parallel)

Key Parameters

Output

Re-judge

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

STORM: Multi-agent Collaboration with State Management

Setup

Prerequisites

Quick Start

Manual Installation

Environment Variables

Prepare Data

Commit0

PaperBench

Running Experiments

Single-Agent Baseline

Multi-Agent (STORM)

Batch Run (all papers/repos in parallel)

Key Parameters

Output

Re-judge

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages