AI Agent framework for automated interpretability workflows, with support for multiple LLM backends (OpenAI, Anthropic, Google Gemini) and a configurable workspace of tools and sub-agents.
- Python 3.10+
- Workspace: A
workspace/directory with aconfig.yamlthat defines your agent graph (seeworkspace/config.yamlin this repo).
git clone https://github.com/YOUR_ORG/InterpAgent.git
cd InterpAgentpython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtFor a full environment (including optional GPU/science stack), use Conda:
conda env create -f environment.yaml
conda activate deviceAgentCreate a .env file in the project root with the API keys for the providers you want to use:
# At least one of these is required, depending on which LLM you select in the app
OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://api.openai.com/v1 # optional, for custom endpoints
ANTHROPIC_API_KEY=your_anthropic_key
ANTHROPIC_BASE_URL=https://api.anthropic.com # optional, for custom endpoint
GOOGLE_API_KEY=your_google_genai_key
GOOGLE_BASE_URL= # optional, for custom endpointDo not commit .env; it is listed in .gitignore.
From the project root:
python main.pyThe app will start and open in your browser (default: http://localhost:8501). Enter your name, choose an LLM, and start chatting.
InterpAgent/
├── app.py # Streamlit UI entry (run via main.py)
├── main.py # CLI launcher; use this to start the app
├── style.css # App styling
├── requirements.txt # Python dependencies
├── environment.yaml # Conda environment (optional)
├── core/ # Agent runtime and UI utilities
│ ├── base/ # LLM wrappers, state, tools, planner
│ ├── st_utils/ # Streamlit sidebar and message rendering
│ └── generate.py # Builds agent from workspace config
└── workspace/ # Agent config and tools (required)
├── config.yaml # Defines agent_path and agent_graph
├── Agents/ # Agent modules and tools
├── prompts/ # Example prompts
├── data/
└── results/
Please cite as:
Marin-Llobet and Ferrando, "Automated Interpretability and Feature Discovery in Language Models with Agents", Preprint at arXiv https://arxiv.org/abs/2605.01555, 2026.
BibTeX Citation:
@article{marin2026interpagent,
title = {Automated Interpretability and Feature Discovery in Language Models with Agents},
author = {Marin-Llobet, Arnau and Ferrando, Javier},
journal = {arXiv preprint arXiv:2605.01555},
year = {2026},
url = {https://arxiv.org/abs/2605.01555}
}