Claude Code Proxy

Use your Claude Max subscription as an API. This proxy wraps the Claude CLI as a subprocess and exposes standard API endpoints that any SDK or tool can talk to.

Two APIs, one proxy:

Anthropic Messages API — POST /v1/messages
OpenAI Chat Completions API — POST /v1/chat/completions

Works with the Anthropic SDK, OpenAI SDK, Python clients, Cursor, Continue, aider, LiteLLM, OpenClaw, and anything else that accepts a custom base URL.

Why?

Claude Max gives you generous usage through the CLI, but no API access. This proxy bridges that gap — run it locally and point any SDK at it.

Prerequisites

Node.js 20+
Claude CLI installed and authenticated (claude --version should work)
Claude Max subscription (the CLI must be logged in)

Install

git clone https://github.com/AntonioAEMartins/claude-code-proxy.git
cd claude-code-proxy
npm install
npm run build
npm link     # makes `claude-proxy` available globally

Usage

# Start the proxy (no auth, for local use)
REQUIRE_AUTH=false claude-proxy

# With auth
PROXY_API_KEYS=my-secret-key claude-proxy

# Custom port
PORT=8080 REQUIRE_AUTH=false claude-proxy

The proxy starts on http://127.0.0.1:4523 by default.

Connect Your SDK

TypeScript / JavaScript

// Anthropic SDK
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "any-string",
  baseURL: "http://localhost:4523",
});

const message = await client.messages.create({
  model: "claude-sonnet-4",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello!" }],
});

// OpenAI SDK
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "any-string",
  baseURL: "http://localhost:4523/v1",
});

const completion = await client.chat.completions.create({
  model: "claude-sonnet-4",
  messages: [{ role: "user", content: "Hello!" }],
});

Python

# Anthropic
import anthropic

client = anthropic.Anthropic(api_key="any-string", base_url="http://localhost:4523")
message = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

# OpenAI
import openai

client = openai.OpenAI(api_key="any-string", base_url="http://localhost:4523/v1")
completion = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

Other Tools

Any tool with a "custom base URL" or "OpenAI-compatible" setting works:

Tool	Base URL Setting
Cursor	`http://localhost:4523/v1`
Continue	`http://localhost:4523/v1`
aider	`--openai-api-base http://localhost:4523/v1`
LiteLLM	`api_base="http://localhost:4523/v1"`
OpenClaw	`OPENAI_BASE_URL=http://host.docker.internal:4523/v1`

Use any sonnet, opus, or haiku family name as the model, including versioned variants like claude-sonnet-4-6, claude-opus-4-7, sonnet-4-7, or haiku-5. Model names with a claude-code-cli/ or openai/ prefix are also accepted (the prefix is stripped automatically). Unknown model families return HTTP 400 instead of falling back to the DEFAULT_MODEL.

Streaming

Both APIs support streaming out of the box:

// Anthropic streaming
const stream = client.messages.stream({
  model: "claude-sonnet-4",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a haiku" }],
});

for await (const event of stream) {
  // process events
}

// OpenAI streaming
const stream = await client.chat.completions.create({
  model: "claude-sonnet-4",
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
  // Optional: emit a final chunk with usage totals before [DONE].
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
  if (chunk.usage) console.error("usage:", chunk.usage);
}

Available Models

Family advertised at `/v1/models`	Accepted examples
`claude-opus-4-6`	`opus`, `claude-opus-4-6`, `claude-opus-4-7`, `opus-5`
`claude-sonnet-4-6`	`sonnet`, `claude-sonnet-4-6`, `sonnet-4-7`
`claude-haiku-4-5`	`haiku`, `claude-haiku-4-5`, `haiku-5`

Features

Streaming & non-streaming for both Anthropic and OpenAI formats
System prompts
Multi-turn conversations
Tool use / function calling via MCP bridge
Structured output (JSON schema)
Extended thinking (set ENABLE_THINKING=true)
Effort levels — low, medium, high, max (model-dependent)
Rate limit propagation from CLI quota
Auto-cleanup — subprocess killed on client disconnect or timeout
OpenClaw integration — automatic tool name mapping, system prompt filtering, and input_text block support

Configuration

All settings are environment variables:

Variable	Default	Description
`PORT`	`4523`	Server port
`HOST`	`127.0.0.1`	Bind address
`PROXY_API_KEYS`	—	Comma-separated API keys for auth
`REQUIRE_AUTH`	`true`	Set `false` for local use
`CLAUDE_PATH`	`claude`	Path to Claude CLI
`DEFAULT_MODEL`	`sonnet`	Reserved default model setting; unknown request models now return 400
`DEFAULT_EFFORT`	`high`	Default effort level
`REQUEST_TIMEOUT_MS`	`300000`	Request timeout (5 min)
`LOG_LEVEL`	`info`	`debug`, `info`, `warn`, `error`
`ENABLE_THINKING`	`false`	Include thinking blocks
`PROXY_MCP_CONFIG`	—	Path to MCP server registry JSON file

MCP Server Registry

Optionally expose pre-registered MCP servers (e.g., Neon, Supabase) to API clients. Credentials stay server-side; clients activate servers by name.

1. Create a config file (mcp-servers.json):

{
  "mcpServers": {
    "neon": {
      "command": "npx",
      "args": ["-y", "@neondatabase/mcp-server-neon"],
      "env": { "NEON_API_KEY": "your-key-here" }
    }
  }
}

2. Start the proxy:

PROXY_MCP_CONFIG=./mcp-servers.json claude-proxy

3. Activate per request:

# Anthropic format — metadata.mcp_servers
curl -X POST http://localhost:4523/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","max_tokens":1024,"metadata":{"mcp_servers":["neon"]},"messages":[{"role":"user","content":"List my database tables"}]}'

# OpenAI format — x-mcp-servers header
curl -X POST http://localhost:4523/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-mcp-servers: neon" \
  -d '{"model":"sonnet","messages":[{"role":"user","content":"List my database tables"}]}'

Without PROXY_MCP_CONFIG, behavior is unchanged (full MCP isolation). Without mcp_servers in a request, no registry servers are activated.

API Endpoints

Method	Path	Description
`POST`	`/v1/messages`	Anthropic Messages API
`POST`	`/v1/chat/completions`	OpenAI Chat Completions
`GET`	`/v1/models`	List available models
`GET`	`/health`	Health check

Rate Limit Headers

On non-streaming responses, the proxy forwards quota information from the Claude CLI as standard HTTP headers. These originate from Anthropic's own anthropic-ratelimit-* headers, surfaced by the CLI in its output stream.

Header	Format	Description
`x-ratelimit-limit`	Integer (e.g. `1000`)	Maximum requests/tokens allowed in the current window
`x-ratelimit-remaining`	Integer (e.g. `999`)	Quota units remaining. Token values are rounded to the nearest thousand by Anthropic
`x-ratelimit-reset`	RFC 3339 timestamp (e.g. `2026-03-19T15:30:00Z`)	When the current rate limit window fully replenishes

On 429 errors, x-ratelimit-reset is always set in the response header. For streaming responses, these headers cannot be set after the stream has started — the reset time is included in the SSE error event body (reset_at field) instead.

All three headers are included in Access-Control-Expose-Headers for browser clients.

How It Works

Each API request spawns a fresh claude --print subprocess. The proxy translates between API formats and CLI I/O:

Your App  →  HTTP Request  →  Proxy  →  claude --print  →  Claude Max
                                ↕
                          Translates formats
                          (Anthropic ↔ CLI ↔ OpenAI)

Prompts are sent via stdin
Responses are parsed from stdout (NDJSON stream)
Each request is stateless — no sessions, no state between calls
The proxy uses --dangerously-skip-permissions for non-interactive operation

Limitations

No image/vision support yet — image content blocks are not passed through
Sampling parameters ignored — temperature, top_p, top_k are accepted but have no effect (CLI doesn't expose them)
max_tokens is advisory — there's no direct CLI flag for token limits
Rate limits depend on your subscription — the proxy passes through whatever quota the CLI reports
One completion per request — n > 1 is not supported

Security

Uses spawn() (not exec()) to prevent shell injection
API key comparison uses crypto.timingSafeEqual
Request bodies capped at 10MB
Subprocess environment is filtered — your secrets are not leaked to the CLI
Subprocesses are killed on client disconnect and request timeout

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

The short version: open an issue to discuss, then fork, branch, and submit a PR. See the contributing guide for branch naming, commit conventions, code guidelines, and testing steps.

License

MIT — use it however you want, just keep the copyright notice.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.claude/skills/setup-proxy		.claude/skills/setup-proxy
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Proxy

Why?

Prerequisites

Install

Usage

Connect Your SDK

TypeScript / JavaScript

Python

Other Tools

Streaming

Available Models

Features

Configuration

MCP Server Registry

API Endpoints

Rate Limit Headers

How It Works

Limitations

Security

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code Proxy

Why?

Prerequisites

Install

Usage

Connect Your SDK

TypeScript / JavaScript

Python

Other Tools

Streaming

Available Models

Features

Configuration

MCP Server Registry

API Endpoints

Rate Limit Headers

How It Works

Limitations

Security

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages