Use Claude Code with OpenCode models — cheaper, smarter routing, and automatic context compression.
This stack combines two tools:
- opencode-proxy — routes Claude Code requests to OpenCode, handles protocol translation and model selection
- Headroom — "The Context Optimization Layer for LLM Applications" — compresses your conversation history before each request, cutting token usage by 15–50%
Claude Code is a powerful AI coding assistant, but using it with Anthropic directly gets expensive fast. OpenCode offers the same models (and others) at much lower cost — but plugging Claude Code into OpenCode directly has several problems:
Claude Code sends requests in Anthropic format (/v1/messages). But many OpenCode models expect OpenAI format (/chat/completions). Without this proxy, those models simply don't work.
This proxy automatically translates the request format — including full streaming support for tool calls.
OpenCode has two tiers:
- Paid (go-tier):
https://opencode.ai/zen/go/v1 - Free:
https://opencode.ai/zen/v1
Claude Code only knows one ANTHROPIC_BASE_URL. If you point it at the go-tier URL and try to use a free model, the request goes to the wrong endpoint and fails. You can't point Claude Code at both URLs simultaneously.
This proxy knows which models are free vs paid and routes each request to the correct URL automatically.
If you try setting "model": "north-mini-code-free" or "model": "mimo-v2.5-free" in your Claude Code settings.json, it won't work — Claude Code validates model names against Anthropic's list, or passes them through without the right routing context.
This proxy lets you use simple routing tokens instead:
"model": "free-auto"→ automatically picks the best free model for your task"model": "go-auto"→ automatically picks the best paid model for your task"model": "claude-haiku-4-5"→ auto-mapped tofree-auto(haiku = cheap = free tier)- Any other
claude-*model → auto-mapped togo-auto
Long coding sessions accumulate a huge context window. Every request re-sends the entire conversation history. Costs grow linearly with session length.
Headroom (the companion service in this stack) compresses your context before each request — removing redundant tool results, summarising old messages, and stripping content Claude Code doesn't need to re-read. In practice it removes 15–50% of tokens per request, which compounds significantly over a long session. Headroom advertises cuts of up to 90% in token-heavy workloads.
Claude Code
│ ANTHROPIC_BASE_URL=http://localhost:8787
▼
Headroom :8787 ← compresses context, strips redundant history
│
▼
opencode-proxy :8080 ← routes model, converts protocol, handles fallbacks
│
├─→ OpenCode zen/go/v1 (paid models: kimi, deepseek, qwen, minimax, mimo, glm)
└─→ OpenCode zen/v1 (free models: big-pickle, north-mini-code-free, etc.)
Both services run in Docker on your laptop. Claude Code talks to Headroom on port 8787. Port 8080 is the proxy — not exposed to Claude Code, but you can curl it directly for health checks and stats.
1. Get an OpenCode API key at opencode.ai (much cheaper than Anthropic directly).
2. Configure and start:
cp .env.example .env
# Edit .env — fill in your OPENCODE_API_KEY
./run.sh3. Point Claude Code at the stack. In ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8787",
"ANTHROPIC_API_KEY": "any-non-empty-string"
},
"model": "go-auto"
}
ANTHROPIC_API_KEYis required by Claude Code's config format but is not validated by the proxy — the proxy usesOPENCODE_API_KEYfor upstream calls. Use any non-empty string.
4. Verify it's running:
curl http://localhost:8080/healthz
# → {"status":"ok","upstream":"https://opencode.ai/zen/go/v1"}Open VS Code, open the Claude Code panel, and ask it to write a function. If you get a response, the stack is working.
./stop.sh # to shut everything downIf you just want to try this without spending anything, you can run on free-tier models. Free models have stricter rate limits and lower quality, but they're real OpenCode models, not a demo.
1. Get a free OpenCode account at opencode.ai — the free tier doesn't need a paid key.
2. Edit .env — set the free-tier URL and use free-auto everywhere:
OPENCODE_API_KEY=sk-your-free-key-here
OPENCODE_FREE_URL=https://opencode.ai/zen/v1
UPSTREAM_URLstays at its default (zen/go/v1). Free models useOPENCODE_FREE_URL— they never touchUPSTREAM_URL.
3. In ~/.claude/settings.json, force free routing:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8787",
"ANTHROPIC_API_KEY": "any-non-empty-string"
},
"model": "free-auto"
}4. Start and test:
./run.sh
curl http://localhost:8080/healthzWhen you're ready to upgrade to paid models, change model to go-auto in Claude Code settings — UPSTREAM_URL in .env is already pointing at the paid endpoint.
Tip: Start with
free-autofor the first day to make sure the stack works, then switch togo-autoonce you're confident. The router is the same code path either way — only the model pool changes.
If you're using the Claude Code VS Code extension, the setup is the same as the standalone CLI — Claude Code reads ANTHROPIC_BASE_URL from your settings.json regardless of whether you're in the terminal or the editor.
Verify the extension is using your proxy:
- Open VS Code's Command Palette (
Cmd+Shift+P/Ctrl+Shift+P) - Run
Claude Code: Open Settings(or edit~/.claude/settings.jsondirectly) - Confirm
ANTHROPIC_BASE_URLpoints athttp://localhost:8787 - Open a file, select some code, and run
Cmd+Lto send it to Claude - Check
docker compose logs opencode-proxy -fin another terminal — you should see aForwarding POST /v1/messagesline for each request
If the extension says "API key invalid":
- Confirm
ANTHROPIC_API_KEYis set insettings.json(any non-empty string works) - The proxy does not forward this key to OpenCode — it uses
OPENCODE_API_KEYfrom.env - If you're using a real Anthropic key and it doesn't work, check that
ANTHROPIC_BASE_URLishttp://localhost:8787(nothttps://api.anthropic.com)
Recommended workflow:
- Keep the proxy running in the background (
./run.shin a dedicated terminal tab) - Use
go-autofor most work, switch tofree-autowhen you want to test something quick - Pin to a specific model (
"model": "kimi-k2.7") when you need reproducible behavior for a benchmark or comparison - Use
direct:claude-opus-4-7when you genuinely need Anthropic's flagship model — setDIRECT_URLandDIRECT_KEYin.envfor this
Set "model" in ~/.claude/settings.json:
| Setting | What happens | Cost |
|---|---|---|
go-auto |
Proxy picks the best paid model for each task | Paid |
free-auto |
Proxy picks the best free model for each task | Free |
claude-haiku-* |
Same as free-auto |
Free |
claude-sonnet-*, claude-opus-* |
Same as go-auto |
Paid |
kimi-k2.7, qwen3.7-max, … |
Pin to a specific model | Paid |
How go-auto works: Each request is classified by task type (code, reasoning, long context, creative, agent) and routed to the best-suited model. A coding question goes to kimi-k2.7, an architecture discussion goes to deepseek-v4-pro, a long document summary goes to minimax-m3.
Fallback chains: If a model returns an error (rate limit, timeout, 5xx), the proxy automatically retries with the next model in the chain — no interruption to your session.
| Variable | Required | Description |
|---|---|---|
OPENCODE_API_KEY |
✅ | Your OpenCode API key |
UPSTREAM_URL |
✅ | Paid endpoint: https://opencode.ai/zen/go/v1 |
OPENCODE_FREE_URL |
✅ for free models | Free endpoint: https://opencode.ai/zen/v1 |
PORT |
— | Proxy listen port (default 8080) |
PROXY_API_KEY |
— | Optional: require a Bearer token on inbound requests |
DIRECT_URL / DIRECT_KEY |
— | Optional: bypass OpenCode for direct:<model> requests |
MODEL_MAP |
— | Optional JSON override for models.json. Useful for one-off testing. |
Copy .env.example to .env to get started.
Every model in models.json declares a fallbacks list. If the primary model returns a retryable error (429, 500, 502, 503, 504), the proxy automatically retries with the next model in the chain — your session doesn't break. Example:
"kimi-k2.7": { ..., "fallbacks": ["deepseek-v4-flash", "deepseek-v4-pro"] }If kimi-k2.7 returns a 429, the proxy retries with deepseek-v4-flash; if that fails too, it tries deepseek-v4-pro. Each attempt is logged with a Fallback N/M: prefix so you can see the chain in action.
Fallback chains work for non-streaming responses and for the first chunk of a streaming response. Mid-stream failures fall through to the client (you can't restart a half-sent SSE stream).
Connection refused on port 8787 (Headroom) — Headroom isn't running. Check:
docker compose ps # see if headroom is up
docker compose logs headroom # see why it failed
# or if using ./run.sh:
tail -f .run/headroom.logConnection refused on port 8080 (opencode-proxy) — Proxy isn't running. Same as above but with opencode-proxy / .run/proxy.log.
502 Bad Gateway from proxy — OpenCode rejected the request. Check:
OPENCODE_API_KEYin.envis valid- The model exists in
models.json - Logs:
docker compose logs opencode-proxy | grep "Upstream"— the upstream error body is logged at WARNING
Claude Code says "API key invalid" — Confirm ANTHROPIC_API_KEY in ~/.claude/settings.json is set to any non-empty string. The proxy doesn't validate it, but Claude Code's config format requires it.
Claude Code says "Cannot connect to API" — Check ANTHROPIC_BASE_URL is http://localhost:8787 (not https://api.anthropic.com).
Slow first request to a model — OpenCode loads the model on first use. Subsequent calls are fast. The proxy doesn't add startup latency itself.
Want to see what model was chosen for a request:
docker compose logs opencode-proxy | grep "auto-router"
# or
tail -f .run/proxy.log | grep "auto-router"Want to see latency breakdown per request:
docker compose logs opencode-proxy | grep "req="
# example output:
# req=a1b2c3d4 total=2340ms sanitize=12ms forward=2320ms model=kimi-k2.7 status=200Want to see all 5xx errors from the proxy:
docker compose logs opencode-proxy | grep -E "Upstream.*error|status=5"# What's happening right now
curl http://localhost:8080/admin/stats | jq
# → total requests, by model, status codes, p50/p95/p99 latency
# How much context Headroom has compressed
curl http://localhost:8787/stats | jq
# → tokens removed, compression %, best single compression
# Per-request timing in Docker logs
docker compose logs opencode-proxy -f
# req=c9689b54 total=5374ms sanitize=5ms forward=5368ms model=minimax-m3 status=200pip install -e ".[dev]"
pytest tests/ -v # 140 tests, ~0.37s
ruff check .
mypy *.py conversion/main.py FastAPI app, health, /admin/stats
forward.py 4-stage pipeline: sanitize → route → convert → forward
router.py Model classification and selection
sanitization.py Message cleaner (strips thinking blocks, fixes tool results)
conversion/ Anthropic ↔ OpenAI protocol translation
models.json Model routing config (URL, key, fallbacks per model)
This project is licensed under the MIT License - see the LICENSE file for details.
