`openinterp` — Python SDK + CLI for openinterp.org

Search the feature Atlas, generate Traces from your own SAE, rank against the public InterpScore leaderboard.

· Python ≥ 3.10 · Apache-2.0

Install

pip install openinterp              # lite: Atlas + CLI (no torch, ~2 MB total)
pip install "openinterp[full]"      # + torch/transformers/safetensors for trace generation

Requires Python ≥ 3.10.

Part of a 5-repo ecosystem

Repo	What's in it
`.github`	Org profile + shared CoC + SECURITY
`web`	Next.js site behind openinterp.org
`notebooks`	23 training + interpretability notebooks
`cli` (you are here)	`pip install openinterp` — Python SDK
`mechreward`	SAE features as dense RL reward

Quick start

Search the Atlas (offline, zero GPU)

$ openinterp atlas "overconfidence"

                    Atlas results: 'overconfidence'
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━
┃ ID      ┃ Name                    ┃ Model             ┃ AUROC ┃ Description
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━
│ f2503   │ overconfidence_pattern  │ Qwen/Qwen3.6-27B  │  0.54 │ Definitive…
│ f1847   │ urgency_assessment      │ Qwen/Qwen3.6-27B  │  0.68 │ Time-critic…
└─────────┴─────────────────────────┴───────────────────┴───────┴────────────

>>> from openinterp import search_features
>>> features = search_features("overconfidence", model="Qwen/Qwen3.6-27B")
>>> features[0].id
'f2503'

Generate a Trace from your own SAE

pip install "openinterp[full]"

openinterp trace \
    --model google/gemma-2-2b \
    --sae-repo YOUR_HF_USER/gemma2-2b-sae-first \
    --prompt "The capital of France is" \
    --layer 12 \
    --d-model 2304 --d-sae 16384 --k 64 \
    --out my_trace.json

This:

Loads the base model in bf16 with SDPA (no flash-attn)
Loads your SAE from HuggingFace (sae_lens safetensors format)
Generates tokens, captures residuals at layer 12
Applies the SAE, picks top-10 active features
Writes a Trace JSON matching openinterp.org/observatory/trace byte-for-byte

Python API

from openinterp import generate_trace

trace = generate_trace(
    model_id="google/gemma-2-2b",
    sae_repo="YOUR_HF_USER/gemma2-2b-sae-first",
    prompt="The capital of France is",
    layer=12,
    d_model=2304,
    d_sae=16384,
    k=64,
)

print(trace.model_dump_json(indent=2))   # Trace Theater schema

With feature labels from notebook 04

# After running 04_discover_features.ipynb (emits feature_catalog.json):
openinterp trace ... --catalog feature_catalog.json

Trace features inherit names from your catalog.

FabricationGuard (v0.2.0+)

Production hallucination probe on Qwen3.6-27B. AUROC 0.88 cross-task on SimpleQA, −88% confident-wrong reduction in mitigation mode, ~1ms scoring latency.

from openinterp import FabricationGuard

guard = FabricationGuard.from_pretrained("Qwen/Qwen3.6-27B")
output = guard.generate("Who won the 2003 Nobel Prize in Aerodynamics?", mode="abstain")
# → "I don't have reliable information to answer this confidently."

Methodology lineage: extends Anthropic's persona-vectors approach (Aug 2025, tested on 7-8B) to Qwen3.6-27B (3-4× larger) with formal cross-task AUROC + bootstrap CIs + mitigation-rate evaluation. Apache-2.0 production-grade implementation, not a proprietary platform. Probe artifact: caiovicentino1/FabricationGuard-linearprobe-qwen36-27b. Live demo: openinterp.org/products/fabricationguard.

ReasonGuard v0.1 (in registry)

Reasoning-faithfulness probe at L55 / mid_think on Qwen3.6-27B in thinking mode. Detects wrong-answer trajectories during the <think> block. Honest narrow scope: AUROC 0.888 within math reasoning (GSM8K), 0.605 cross-domain to commonsense (StrategyQA) — domain-bound, not generalized.

Layer × position interaction (novel): shallow layers (L31) favor end_question; deep layers (L55) favor mid_think. Position-of-faithfulness migrates with depth.

from openinterp import probebench

probe = probebench.load("openinterp/reasonguard-qwen36-27b-l55-mid_think")
score = probe.score(activations)  # P(wrong-answer trajectory)

Both numbers (within + cross) registered honestly per ProbeBench's anti-Goodhart norms. Probe artifact: caiovicentino1/ReasoningGuard-linearprobe-qwen36-27b. Live on openinterp.org/probebench.

ProbeBench (v0.2.0+)

The first categorical leaderboard for activation probes — 8 categories, 7-axis ProbeScore, anti-Goodhart by construction.

from openinterp import probebench

probes = probebench.list_probes(category="hallucination")
probe  = probebench.load("openinterp/fabricationguard-qwen36-27b-l31-v2")
score  = probe.score(activations)

openinterp probebench list                       # show all registered probes
openinterp probebench load <probe-id>            # download + verify SHA-256
openinterp probebench validate ./my-probe/       # check artifact spec
openinterp probebench reproduce <probe-id>       # download reproducer notebook

Browse the leaderboard: openinterp.org/probebench.

AgentProbeGuard cookbooks

examples/agent_probe_guard_env_coupling.md — detect env coupling with cosine + residual norms, refit with AgentProbeGuard.refit(), and choose between one refit helper vs per-env weight variants.

v0.2.1 — `safe_load_qwen36_lora()`

Encapsulates the Qwen3.6 PEFT-save .language_model. infix bug discovered during paper-2 grokking work (April 2026). Saved Qwen3.6 LoRA adapters carry an extra .language_model. infix in state-dict keys; PeftModel.from_pretrained() against a reloaded dense Qwen3.6 silently fails — adapter loaded, max logit-diff = 0.000, no error raised.

from openinterp import safe_load_qwen36_lora

model = safe_load_qwen36_lora(
    base_model_id="Qwen/Qwen3.6-27B",
    adapter_path="path/to/checkpoint-200",
)  # auto strip .language_model. + auto verify logit-diff > 0.01

Also exposed: strip_language_model_infix(), verify_adapter_loaded(), LoRAVerificationError. This bug invalidated ~10 hours of prior eval work before being caught — anyone working with Qwen3.6 LoRA save/reload pipelines should run the sanity check.

What's in v0.2.x

Command	Status	What it does
`openinterp atlas <query>`	live	Feature search with offline fallback to curated demo features
`openinterp trace ...`	live (needs `[full]`)	Real SAE trace generation, sae_lens format, any HF model
`openinterp guard ...`	live	FabricationGuard scoring + abstain mode on Qwen3.6-27B
`openinterp probebench {list,load,score,validate,reproduce,submit}`	live	ProbeBench v0.0.1 SDK
`openinterp.lora.safe_load_qwen36_lora`	live (v0.2.1)	Safe Qwen3.6 LoRA loader with strip + verify
`openinterp info`	live	Version + optional-dep status

Planned v0.3.0

openinterp upload-trace <trace.json> → shareable openinterp.org URL
openinterp score --sae-repo X → compute InterpScore (wraps notebook 18)
openinterp steer --sae-repo X --feature Y --alpha Z → intervention (wraps notebook 06)
openinterp circuit --sae-repo X --prompt Y → attribution graph JSON (wraps notebook 14/15)
openinterp publish <repo> → HuggingFace release with model card
ReasonGuard v0.2 — multi-bench training (math + commonsense) to fix cross-domain transfer

Open an issue on the tracker if you'd take one of these.

Development

git clone https://github.com/OpenInterpretability/cli openinterp-cli
cd openinterp-cli
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,full]"          # dev = pytest + ruff + build; full = torch + transformers
pytest -xvs                            # 5 tests, ~1s

Package layout

openinterp-cli/
├── pyproject.toml              # name='openinterp', hatchling build
├── openinterp/
│   ├── __init__.py             # public exports + __version__
│   ├── models.py               # pydantic types: AtlasFeature, Trace, TraceFeature
│   ├── atlas.py                # search_features() — HF API + curated fallback
│   ├── trace.py                # generate_trace() — real transformers-based impl
│   └── cli.py                  # click-based CLI: atlas / trace / info
├── tests/
│   ├── test_atlas.py
│   └── test_trace.py
├── CHANGELOG.md
├── CONTRIBUTING.md
└── README.md

Contribution recipe — add a new command

Full rules: CONTRIBUTING.md.

Decide which notebook it wraps (score → 18, steer → 06, circuit → 14/15, publish → generic)
Add a function to the matching file (openinterp/score.py, etc.). Keep it small — actual compute lives in the notebook.
Expose it in __init__.py
Add a @main.command() in cli.py with click decorators
Add a smoke test in tests/test_<name>.py
Update CHANGELOG.md under [Unreleased]
PR title: Add openinterp <command>

Hard rules:

Python ≥ 3.10 syntax (PEP 604 unions OK)
dtype=torch.bfloat16, never torch_dtype= (transformers 5.x deprecated)
SDPA only, never flash-attn
New heavy deps (torch tier) → add to [full] extra, not base
Every new public function has type hints + docstring

Release process (maintainer)

# 1. Bump version in BOTH:
#    pyproject.toml          ([project] version = "X.Y.Z")
#    openinterp/__init__.py  (__version__ = "X.Y.Z")
# 2. Update CHANGELOG.md — move [Unreleased] → [X.Y.Z] — YYYY-MM-DD

source .venv/bin/activate
rm -rf dist/
python -m build
python -m twine check dist/*
python -m twine upload dist/*     # needs PyPI token in ~/.pypirc

git tag vX.Y.Z
git push --tags

CI

Every PR runs:

pytest -xvs across Python 3.10, 3.11, 3.12 (see .github/workflows/ci.yml)
ruff check . (warn-only for now)
python -m build + twine check

Green required to merge.

Community

Discussions — API proposals, "which repo should this live in"
Good-first-issues
PyPI release history
hi@openinterp.org

Built on

Neuronpedia (SAE encyclopedia) · Gemma Scope (reference SAE suite) · Gao et al. 2024 (TopK + AuxK) · SAELens (safetensors format).

Apache-2.0 · openinterp.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`openinterp` — Python SDK + CLI for openinterp.org

Install

Part of a 5-repo ecosystem

Quick start

Search the Atlas (offline, zero GPU)

Generate a Trace from your own SAE

Python API

With feature labels from notebook 04

FabricationGuard (v0.2.0+)

ReasonGuard v0.1 (in registry)

ProbeBench (v0.2.0+)

AgentProbeGuard cookbooks

v0.2.1 — `safe_load_qwen36_lora()`

What's in v0.2.x

Planned v0.3.0

Development

Package layout

Contribution recipe — add a new command

Release process (maintainer)

CI

Community

Built on

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
examples		examples
openinterp		openinterp
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

openinterp — Python SDK + CLI for openinterp.org

Install

Part of a 5-repo ecosystem

Quick start

Search the Atlas (offline, zero GPU)

Generate a Trace from your own SAE

Python API

With feature labels from notebook 04

FabricationGuard (v0.2.0+)

ReasonGuard v0.1 (in registry)

ProbeBench (v0.2.0+)

AgentProbeGuard cookbooks

v0.2.1 — safe_load_qwen36_lora()

What's in v0.2.x

Planned v0.3.0

Development

Package layout

Contribution recipe — add a new command

Release process (maintainer)

CI

Community

Built on

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`openinterp` — Python SDK + CLI for openinterp.org

v0.2.1 — `safe_load_qwen36_lora()`

Packages