Use your Claude Max subscription as an API. This proxy wraps the Claude CLI as a subprocess and exposes standard API endpoints that any SDK or tool can talk to.
Two APIs, one proxy:
- Anthropic Messages API —
POST /v1/messages - OpenAI Chat Completions API —
POST /v1/chat/completions
Works with the Anthropic SDK, OpenAI SDK, Python clients, Cursor, Continue, aider, LiteLLM, OpenClaw, and anything else that accepts a custom base URL.
Claude Max gives you generous usage through the CLI, but no API access. This proxy bridges that gap — run it locally and point any SDK at it.
- Node.js 20+
- Claude CLI installed and authenticated (
claude --versionshould work) - Claude Max subscription (the CLI must be logged in)
git clone https://github.com/AntonioAEMartins/claude-code-proxy.git
cd claude-code-proxy
npm install
npm run build
npm link # makes `claude-proxy` available globally# Start the proxy (no auth, for local use)
REQUIRE_AUTH=false claude-proxy
# With auth
PROXY_API_KEYS=my-secret-key claude-proxy
# Custom port
PORT=8080 REQUIRE_AUTH=false claude-proxyThe proxy starts on http://127.0.0.1:4523 by default.
// Anthropic SDK
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: "any-string",
baseURL: "http://localhost:4523",
});
const message = await client.messages.create({
model: "claude-sonnet-4",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello!" }],
});// OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "any-string",
baseURL: "http://localhost:4523/v1",
});
const completion = await client.chat.completions.create({
model: "claude-sonnet-4",
messages: [{ role: "user", content: "Hello!" }],
});# Anthropic
import anthropic
client = anthropic.Anthropic(api_key="any-string", base_url="http://localhost:4523")
message = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)# OpenAI
import openai
client = openai.OpenAI(api_key="any-string", base_url="http://localhost:4523/v1")
completion = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello!"}],
)Any tool with a "custom base URL" or "OpenAI-compatible" setting works:
| Tool | Base URL Setting |
|---|---|
| Cursor | http://localhost:4523/v1 |
| Continue | http://localhost:4523/v1 |
| aider | --openai-api-base http://localhost:4523/v1 |
| LiteLLM | api_base="http://localhost:4523/v1" |
| OpenClaw | OPENAI_BASE_URL=http://host.docker.internal:4523/v1 |
Use any sonnet, opus, or haiku family name as the model, including versioned variants like claude-sonnet-4-6, claude-opus-4-7, sonnet-4-7, or haiku-5. Model names with a claude-code-cli/ or openai/ prefix are also accepted (the prefix is stripped automatically). Unknown model families return HTTP 400 instead of falling back to the DEFAULT_MODEL.
Both APIs support streaming out of the box:
// Anthropic streaming
const stream = client.messages.stream({
model: "claude-sonnet-4",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a haiku" }],
});
for await (const event of stream) {
// process events
}// OpenAI streaming
const stream = await client.chat.completions.create({
model: "claude-sonnet-4",
messages: [{ role: "user", content: "Write a haiku" }],
stream: true,
// Optional: emit a final chunk with usage totals before [DONE].
stream_options: { include_usage: true },
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
if (chunk.usage) console.error("usage:", chunk.usage);
}Family advertised at /v1/models |
Accepted examples |
|---|---|
claude-opus-4-6 |
opus, claude-opus-4-6, claude-opus-4-7, opus-5 |
claude-sonnet-4-6 |
sonnet, claude-sonnet-4-6, sonnet-4-7 |
claude-haiku-4-5 |
haiku, claude-haiku-4-5, haiku-5 |
- Streaming & non-streaming for both Anthropic and OpenAI formats
- System prompts
- Multi-turn conversations
- Tool use / function calling via MCP bridge
- Structured output (JSON schema)
- Extended thinking (set
ENABLE_THINKING=true) - Effort levels —
low,medium,high,max(model-dependent) - Rate limit propagation from CLI quota
- Auto-cleanup — subprocess killed on client disconnect or timeout
- OpenClaw integration — automatic tool name mapping, system prompt filtering, and
input_textblock support
All settings are environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
4523 |
Server port |
HOST |
127.0.0.1 |
Bind address |
PROXY_API_KEYS |
— | Comma-separated API keys for auth |
REQUIRE_AUTH |
true |
Set false for local use |
CLAUDE_PATH |
claude |
Path to Claude CLI |
DEFAULT_MODEL |
sonnet |
Reserved default model setting; unknown request models now return 400 |
DEFAULT_EFFORT |
high |
Default effort level |
REQUEST_TIMEOUT_MS |
300000 |
Request timeout (5 min) |
LOG_LEVEL |
info |
debug, info, warn, error |
ENABLE_THINKING |
false |
Include thinking blocks |
PROXY_MCP_CONFIG |
— | Path to MCP server registry JSON file |
Optionally expose pre-registered MCP servers (e.g., Neon, Supabase) to API clients. Credentials stay server-side; clients activate servers by name.
1. Create a config file (mcp-servers.json):
{
"mcpServers": {
"neon": {
"command": "npx",
"args": ["-y", "@neondatabase/mcp-server-neon"],
"env": { "NEON_API_KEY": "your-key-here" }
}
}
}2. Start the proxy:
PROXY_MCP_CONFIG=./mcp-servers.json claude-proxy3. Activate per request:
# Anthropic format — metadata.mcp_servers
curl -X POST http://localhost:4523/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"sonnet","max_tokens":1024,"metadata":{"mcp_servers":["neon"]},"messages":[{"role":"user","content":"List my database tables"}]}'
# OpenAI format — x-mcp-servers header
curl -X POST http://localhost:4523/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-mcp-servers: neon" \
-d '{"model":"sonnet","messages":[{"role":"user","content":"List my database tables"}]}'Without PROXY_MCP_CONFIG, behavior is unchanged (full MCP isolation). Without mcp_servers in a request, no registry servers are activated.
| Method | Path | Description |
|---|---|---|
POST |
/v1/messages |
Anthropic Messages API |
POST |
/v1/chat/completions |
OpenAI Chat Completions |
GET |
/v1/models |
List available models |
GET |
/health |
Health check |
On non-streaming responses, the proxy forwards quota information from the Claude CLI as standard HTTP headers. These originate from Anthropic's own anthropic-ratelimit-* headers, surfaced by the CLI in its output stream.
| Header | Format | Description |
|---|---|---|
x-ratelimit-limit |
Integer (e.g. 1000) |
Maximum requests/tokens allowed in the current window |
x-ratelimit-remaining |
Integer (e.g. 999) |
Quota units remaining. Token values are rounded to the nearest thousand by Anthropic |
x-ratelimit-reset |
RFC 3339 timestamp (e.g. 2026-03-19T15:30:00Z) |
When the current rate limit window fully replenishes |
On 429 errors, x-ratelimit-reset is always set in the response header. For streaming responses, these headers cannot be set after the stream has started — the reset time is included in the SSE error event body (reset_at field) instead.
All three headers are included in Access-Control-Expose-Headers for browser clients.
Each API request spawns a fresh claude --print subprocess. The proxy translates between API formats and CLI I/O:
Your App → HTTP Request → Proxy → claude --print → Claude Max
↕
Translates formats
(Anthropic ↔ CLI ↔ OpenAI)
- Prompts are sent via stdin
- Responses are parsed from stdout (NDJSON stream)
- Each request is stateless — no sessions, no state between calls
- The proxy uses
--dangerously-skip-permissionsfor non-interactive operation
- No image/vision support yet — image content blocks are not passed through
- Sampling parameters ignored —
temperature,top_p,top_kare accepted but have no effect (CLI doesn't expose them) max_tokensis advisory — there's no direct CLI flag for token limits- Rate limits depend on your subscription — the proxy passes through whatever quota the CLI reports
- One completion per request —
n > 1is not supported
- Uses
spawn()(notexec()) to prevent shell injection - API key comparison uses
crypto.timingSafeEqual - Request bodies capped at 10MB
- Subprocess environment is filtered — your secrets are not leaked to the CLI
- Subprocesses are killed on client disconnect and request timeout
Contributions welcome! Please read CONTRIBUTING.md first.
The short version: open an issue to discuss, then fork, branch, and submit a PR. See the contributing guide for branch naming, commit conventions, code guidelines, and testing steps.
MIT — use it however you want, just keep the copyright notice.