One proxy. Every agent. Every protocol. A 5 MB binary with zero dependencies. Runs on Windows & macOS.
Prism is a universal LLM proxy that connects any AI agent — Claude Code, Codex Desktop, Factory Droid, OpenCode, ZCode, Cursor, and more — to any provider through any protocol. It translates between Anthropic Messages, OpenAI Chat Completions, OpenAI Responses, and Ollama APIs in real time, with built-in one-click integrations that auto-configure your agents. Native system tray, web admin UI, model auto-discovery, and full SSE streaming. Zero config.
Every AI agent speaks a different protocol. Every provider expects a different format. Every model has different capabilities. Prism sits in between, translating requests and responses on the fly — and with one-click integrations, it auto-configures your agents so they just work.
One proxy. Every agent. Every protocol. No Python.
| Prism | LiteLLM | |
|---|---|---|
| Binary size | ~5 MB | ~200 MB (Python + deps) |
| Memory | ~5–10 MB | ~200–500 MB |
| Startup | < 100 ms | ~2–5 s |
| Runtime deps | None | Python 3.9+, pip packages |
| Anthropic API | ✅ | ✅ |
| OpenAI Chat API | ✅ | ✅ |
| OpenAI Responses API | ✅ | ❌ |
| Ollama Native API | ✅ | ✅ |
| Streaming (SSE) | ✅ | ✅ |
| Model remapping | ✅ | ✅ |
| Tool calling | ✅ | ✅ |
| Thinking/reasoning | ✅ | |
| Per-model reasoning toggle | ✅ | ❌ |
| Reasoning effort validation | ✅ | ❌ |
| Image support | ✅ | ✅ |
| Structured outputs | ✅ | |
| Per-model capabilities | ✅ Tools / Vision / Struct | ❌ |
| Auto model config (models.dev) | ✅ Zero-config | ❌ |
| Provider-per-model routing | ✅ | ❌ |
| Claude Code integration | ✅ One-click | ❌ |
| Codex Desktop & CLI integration | ✅ One-click | ❌ |
| Factory Droid integration | ✅ One-click | ❌ |
| OpenCode integration | ✅ One-click | ❌ |
| ZCode integration | ✅ One-click | ❌ |
| Web admin UI | ✅ | ❌ |
| Windows native | ✅ System tray + admin UI | ❌ Requires Python |
| macOS native | ✅ System tray + admin UI | ❌ Requires Python |
Your agents Cloud providers
───────── ────────────────
Claude Code ─────┐
(Anthropic API) │ ┌──────────────┐
│ ┌───────────┐ │ Ollama Cloud │
Codex Desktop ───┼───→│ Prism │──────→│ /api/chat │
(Responses API) │ │ :11434 │ └──────────────┘
│ └───────────┘ ┌──────────────┐
Factory Droid ───┤ │ │ OpenCode Go │
(Chat Completions)│ │ │ /v1/chat/... │
│ │ └──────────────┘
OpenCode ────────┤ │ ┌──────────────┐
(Chat Completions)│ ├──────────────→│ Custom │
│ │ │ /v1/chat/... │
ZCode ───────────┤ │ └──────────────┘
(Chat Completions)│ │
│ │ ┌────────────────────┐
Cursor ──────────┤ └──────────────→│ Codex (via OAuth) │
(OpenAI API) │ │ /backend-api/... │
│ └────────────────────┘
Continue ────────┤
(OpenAI API) │ ┌────────────────────┐
│ │ Codex (via OAuth) │── Sign in with OpenAI
OpenAI SDK ──────┘ │ /backend-api/... │ account — no API key
(Responses API) └────────────────────┘ needed
┌────────────────────┐
│ Admin UI │
│ :8765 │
└────────────────────┘
Prism accepts requests in four protocol formats — Anthropic Messages (/v1/messages), OpenAI Chat Completions (/v1/chat/completions), OpenAI Responses (/v1/responses), and Ollama Native (/api/chat) — translates them to whatever your upstream provider speaks, and translates responses back. For Codex (OpenAI) accounts, Prism routes directly to the ChatGPT backend API, including Chat Completions ↔ Responses API translation so any agent can use your OAuth account regardless of its protocol. Streaming works seamlessly in all directions.
For integrated agents, Prism writes the right config files automatically — environment variables, provider blocks, model catalogs — so your agents see Prism's models without any manual setup.
Windows:
./prism.exemacOS:
open Prism.app
# or from the DMG: drag Prism.app to /Applications, then open itThat's it. Prism starts on http://127.0.0.1:11434 and a system tray icon appears. A web admin UI is available at http://127.0.0.1:8765/admin.
Open the admin UI from the system tray (right-click → Open Settings) or navigate to http://127.0.0.1:8765/admin. In the Provider tab:
- Select your upstream provider (Ollama Cloud, OpenCode Go, a custom provider, or a Codex OAuth account)
- For API-key providers, enter your API key
- For Codex, click Add Codex Account to sign in with your OpenAI account
- Prism auto-restarts with the new config
You can also configure via the config file — %APPDATA%\prism\config.json on Windows or ~/Library/Application Support/prism/config.json on macOS — see Providers below.
In the Models tab, just type a model name and Prism auto-fetches all the details — context length, max output tokens, reasoning support, tool calling, vision, structured outputs, and reasoning effort levels — from models.dev. No manual configuration needed. Select your provider, search for the model, and click to auto-fill everything.
Go to the Agents tab in the admin UI. Prism auto-detects which agents are installed and shows their status. Click Setup next to any agent to configure it with one click. See Agent integrations for details on each agent.
Setting up with Claude Desktop
Edit your Claude Desktop config:
{
"inferenceProvider": "gateway",
"inferenceGatewayBaseUrl": "http://127.0.0.1:11434",
"inferenceGatewayApiKey": "prism",
"inferenceModels": [
{ "name": "glm-5.1:cloud" },
{ "name": "deepseek-v4-pro:cloud", "supports1m": true }
]
}Setting up with Claude Code
Prism integrates with Claude Code by setting environment variables and per-tier model mappings in ~/.claude/settings.json. You can choose which Prism model fills each tier (opus, sonnet, haiku, subagent).
One-click setup: Go to the Agents tab in the admin UI, select your tier models, and click Setup. Prism backs up your existing config and writes the right environment variables.
Manual setup: Edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://127.0.0.1:11434",
"ANTHROPIC_AUTH_TOKEN": "prism",
"ANTHROPIC_API_KEY": ""
}
}Setting up with Codex Desktop & CLI
Prism integrates with Codex Desktop and Codex CLI's native model selector. When enabled, all your Prism models appear directly in the model picker — no need to manually configure each model.
One-click setup: Go to the Agents tab in the admin UI and click Setup under "Codex Desktop Integration".
How it works: Prism writes a managed provider block to ~/.codex/config.toml and generates a model catalog JSON file. Codex Desktop/CLI reads these on launch and populates its model picker with your Prism models. Requests flow through Prism's Responses API endpoint, which translates them to your configured upstream provider.
Automatic sync: Prism auto-syncs the catalog on every startup if Codex Desktop/CLI is detected (~/.codex/config.toml exists), so new models added to your remapping are picked up automatically.
To disable: Click Restore in the Agents tab. This removes Prism's managed blocks and restores any previous settings.
Setting up with Factory Droid
Prism adds your models as [Prism] custom entries in ~/.factory/settings.json. Codex OAuth models are routed through /v1/responses, all others through /v1/chat/completions.
One-click setup: Go to the Agents tab in the admin UI and click Setup under "Factory Droid". Prism backs up your existing config and injects all your Prism models.
To disable: Click Restore to remove all Prism-tagged entries.
Setting up with OpenCode
Prism registers two providers in ~/.config/opencode/opencode.json: prism (for non-Codex models via /v1/chat/completions) and prism-codex (for Codex OAuth models via /v1/responses). The first available Prism model is set as the default.
One-click setup: Go to the Agents tab in the admin UI and click Setup under "OpenCode". Prism backs up your existing config and writes the provider blocks.
To disable: Click Restore to remove the Prism providers and default model references.
Setting up with ZCode
Prism writes a provider block to ~/.zcode/v2/config.json with your Prism base URL and API key. Each model entry includes context/output limits, modalities, and optional reasoning configuration.
One-click setup: Go to the Agents tab in the admin UI and click Setup under "ZCode". Prism backs up your existing config and writes the provider block.
To disable: Click Restore to remove the Prism provider.
Setting up with Cursor / Continue / other OpenAI clients
Point your client to http://127.0.0.1:11434/v1 with any API key. Prism accepts OpenAI Chat Completions requests and translates them to the configured upstream provider.
Setting up with OpenAI SDK (Responses API)
Set the base URL to http://127.0.0.1:11434/v1. Prism accepts OpenAI Responses API requests at /v1/responses and translates them to the configured upstream provider — including streaming, tool calls, and reasoning.
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:11434/v1",
api_key="prism"
)
response = client.responses.create(
model="glm-5.1:cloud",
input="Hello!",
stream=True
)Prism includes built-in, one-click integrations for popular AI coding agents. Each integration auto-detects whether the agent is installed, writes the right config files, and keeps them in sync when your models change.
| Agent | What it does | Config location |
|---|---|---|
| Claude Code | Sets ANTHROPIC_BASE_URL + per-tier model mappings |
~/.claude/settings.json |
| Codex Desktop & CLI | Injects models into native model picker | ~/.codex/config.toml + catalog JSON |
| Factory Droid | Adds [Prism] custom models with smart routing |
~/.factory/settings.json |
| OpenCode | Registers prism + prism-codex providers |
~/.config/opencode/opencode.json |
| ZCode | Registers prism provider with model list |
~/.zcode/v2/config.json |
How it works:
- Auto-detection — Prism checks if each agent's config file or binary exists on disk. The Agents tab shows which agents are installed and active.
- One-click setup — Click Setup to back up the agent's existing config and write Prism's configuration. Click Restore to revert to the backup.
- Auto-sync — Prism syncs all agent configs on startup and whenever you add or remove models, so newly added models appear automatically.
- Smart routing — Codex OAuth models are routed through
/v1/responses, all others through/v1/chat/completions. Each agent gets the right endpoint for its protocol.
When launched without arguments, Prism runs as a system tray application with these options:
| Menu item | Action |
|---|---|
| Start / Stop / Restart Proxy | Control the proxy server process |
| Open Settings | Open the web admin UI in your browser |
| Open Folder | Open the proxy directory in Explorer / Finder |
| Edit Model Config | Open model_remapping.json in Notepad / TextEdit |
| Show Logs | Open a live log viewer console |
| Check for Updates | Check for newer versions of Prism |
| Quit | Stop proxy and exit |
Prism includes a built-in web admin interface for managing everything without editing config files by hand.
URL: http://127.0.0.1:8765/admin (configurable via PRISM_ADMIN_PORT)
The admin UI provides:
| Tab | Features |
|---|---|
| Provider | Select default provider, set API keys, add/edit/remove custom providers |
| OAuth | Manage Codex (OpenAI) accounts — sign in, view session/weekly usage percentages, activate, or remove accounts |
| Models | Edit model remapping — default model, known models with per-model provider, reasoning toggle, capabilities (tools/vision/struct), context length, max output tokens, reasoning effort levels, and aliases. Auto-fill from models.dev — type a model name, search, and click to populate all fields automatically. |
| Agents | One-click setup/restore for Claude Code, Codex Desktop, Factory Droid, OpenCode, and ZCode. Claude Code includes per-tier model selectors (opus, sonnet, haiku, subagent). |
| Stats | Live and historical performance dashboard (see below) |
| Proxy | Start, stop, and restart the proxy; view status; toggle auto-start at login |
| Logs | Live tail of the last 200 log lines |
Changes are saved immediately and the proxy auto-restarts when needed.
The Stats tab surfaces every metric about your proxy usage:
| Section | What it shows |
|---|---|
| Filter bar | Filter by provider, model, client origin, or date range; refresh button to reload all data |
| Tokens Per Day | Stacked bar chart (input + output) with a total headline — persists across restarts via SQLite |
| Tokens Per Month | Filled line chart showing monthly aggregate totals |
| Live TPS | Real-time tokens/sec hero value with a live sparkline chart (120-point rolling window, updated every second) |
| Session Totals | Running counts: total requests, input tokens, output tokens, and average TPS |
| Client Breakdown | Per-client usage stats showing requests, total tokens, and a distribution pie chart — identifies tools like Claude Code, Cursor, Continue, Copilot, Factory Droid, and more automatically by User-Agent |
| TPS History | Table (model, provider, avg/max TPS) paired with a multi-line chart of 5-minute bucket averages over time |
| By Model | Per-model breakdown of requests, token counts, and average TPS |
| Recent Requests | Timestamped log of the last 50 requests with model, client, token counts, TPS, and duration |
| Data Management | One-click Clear All Stats button to wipe all persisted history |
All request data and TPS snapshots are persisted to the stats database (%APPDATA%\prism\stats.db on Windows, ~/Library/Application Support/prism/stats.db on macOS — SQLite, WAL mode) so the dashboard survives proxy restarts and page refreshes. Charts are rendered with Chart.js and automatically adapt to light/dark theme.
Prism automatically identifies which tool is making each request by inspecting the User-Agent header. Detected clients include:
Claude Code, Cursor, Continue, GitHub Copilot, Aider, OpenCode, Windsurf, Trae, Factory Droid, Supermaven, and Claude Desktop.
You can override detection by setting the X-Client-Name header on your requests — the value is used directly in stats, so you can tag requests with custom names like "my-script" or "ci-pipeline".
| Variable | Default | Description |
|---|---|---|
PRISM_PORT |
11434 |
Port for the proxy server |
PRISM_HOST |
127.0.0.1 |
Host to bind (use 0.0.0.0 for network access) |
PRISM_ADMIN_PORT |
8765 |
Port for the admin web UI |
OLLAMA_API_KEY |
— | API key for Ollama Cloud (fallback if not in config) |
OPENCODE_GO_API_KEY |
— | API key for OpenCode Go (fallback if not in config) |
Prism supports multiple upstream providers, configured via the admin UI or the config file (%APPDATA%\prism\config.json on Windows, ~/Library/Application Support/prism/config.json on macOS):
| Provider | Config key | Upstream format | Endpoint |
|---|---|---|---|
| Ollama Cloud | ollama_cloud |
Ollama Native | /api/chat |
| OpenCode Go | opencode_go |
OpenAI | /v1/chat/completions |
| Custom providers | custom_providers[] |
OpenAI | /v1/chat/completions |
| Codex (via OAuth) | oauth_accounts[] |
OpenAI | chatgpt.com/backend-api/codex/responses |
Each model in your remapping is assigned to a specific provider. When a request arrives, Prism resolves the model, looks up its assigned provider, and routes the request to that upstream — even if other models go to different providers. This means you can mix models from Ollama Cloud, OpenCode Go, custom providers, and OAuth accounts in a single session.
- The
default_providerfield in config is used only as a fallback when a model has no explicit provider assignment. - Provider routing is handled by the ProviderRouter, which resolves the provider per-request based on the requested model name.
- Models from different providers can coexist — set each model's provider when adding it to Known Models.
You can add multiple custom providers (e.g. OpenRouter, Groq, Together AI) — each with its own name, base URL, and API key. Add, edit, or delete them from the admin UI Provider tab. Custom providers are assigned unique IDs like custom_myprovider_abc123.
Prism supports signing in with your OpenAI account via OAuth (no API key needed). Click Add Codex Account in the admin UI OAuth tab or system tray, and your browser will open for authentication. Once connected, Prism uses your account token automatically, including token refresh and usage tracking.
Codex requests route directly to chatgpt.com/backend-api/codex/responses — not through api.openai.com — which avoids Cloudflare restrictions on bearer tokens. Prism also extracts the chatgpt-account-id from your JWT token automatically, so you don't need to configure it manually.
Switch providers from the system tray, admin UI, or by changing the default_provider field — no restart required when using the tray/UI.
Full config example
{
"default_provider": "ollama_cloud",
"ollama_cloud": {
"id": "ollama_cloud",
"name": "Ollama Cloud",
"base_url": "https://ollama.com",
"api_key": ""
},
"opencode_go": {
"id": "opencode_go",
"name": "OpenCode Go",
"base_url": "https://opencode.ai/zen/go",
"api_key": ""
},
"custom_providers": [
{
"id": "custom_openrouter_abc123",
"name": "OpenRouter",
"base_url": "https://openrouter.ai/api/v1",
"api_key": ""
}
],
"oauth_accounts": [
{
"id": "codex_user_abc123",
"provider": "codex",
"label": "Codex",
"email": "user@example.com",
"access_token": "...",
"refresh_token": "...",
"expires_at": 1234567890,
"plan_tier": "plus",
"active": true
}
],
"agent_integrations": {
"claude_code_tiers": {
"opus": "deepseek-v4-pro:cloud",
"sonnet": "deepseek-v4-flash:cloud",
"haiku": "glm-5.1:cloud",
"subagent": "deepseek-v4-flash:cloud"
}
}
}API keys in the config file take priority. If empty, Prism falls back to these environment variables:
| Variable | Used for |
|---|---|
OLLAMA_API_KEY |
Ollama Cloud |
OPENCODE_GO_API_KEY |
OpenCode Go |
Prism can remap model names on the fly — useful when clients send model names that don't exist on your upstream provider.
Configured via the admin UI (Models tab) or the model remapping file (%APPDATA%\prism\model_remapping.json on Windows, ~/Library/Application Support/prism/model_remapping.json on macOS).
Instead of manually filling in context lengths, token limits, and capabilities, just type a model name in the Models tab and click Search. Prism queries models.dev and auto-fills:
- Context length
- Max output tokens
- Reasoning toggle and allowed effort levels
- Tool calling, structured output, and vision capabilities
The search is scoped to your selected provider so you get accurate results. No manual configuration needed — search, select, and you're done.
When an unknown model is requested, Prism falls back to this model. Select it from the dropdown in the admin UI or set default_model.
Known models are rich entries — not just strings — with per-model provider assignment, reasoning toggle, capabilities, and token limits. Each entry includes:
| Field | Type | Description |
|---|---|---|
id |
string | Model identifier (e.g. deepseek-v4-flash:cloud) |
provider |
string | Provider to route this model to (e.g. ollama_cloud, opencode_go, a custom provider ID, or an OAuth account ID) |
reasoning |
bool | Whether this model supports thinking/reasoning |
reasoning_effort |
string[] | Allowed reasoning effort levels (low, medium, high, max) |
context_length |
int | Maximum context window in tokens |
max_output_tokens |
int | Maximum output tokens |
capabilities.tool_calling |
bool | Supports tool/function calling |
capabilities.structured_outputs |
bool | Supports structured/JSON output |
capabilities.vision |
bool | Supports image input |
Models matching a known entry pass through without remapping. A model that doesn't match any entry falls back to the default model.
Prism validates reasoning_effort against each model's capabilities:
- Non-reasoning models:
reasoning_effortis automatically stripped from requests. - Reasoning models: Invalid effort values are normalized to the model's first allowed effort (e.g.
"invalid"→"medium"), with a warning logged. - Unknown models:
reasoning_effortis stripped for safety. - Responses API normalization:
enabled/on/true→medium;disabled/off/false/none→ omitted. - Anthropic → OpenAI translation: Anthropic
thinkingis mapped toreasoning_effort=medium.
Map incoming model names to different upstream models.
| Feature | What it does |
|---|---|
| Aliases | Map model names (e.g. claude-3-5-haiku → deepseek-v4-flash:cloud) |
| Default model | Fallback when a requested model isn't recognized |
| Known models | Rich entries with per-model provider, reasoning, and capabilities |
| Auto config | Search models.dev and auto-fill all fields |
Full remapping example
{
"default_model": "glm-5.1:cloud",
"known_models": [
{
"id": "glm-5.1:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true,
"vision": true
}
},
{
"id": "deepseek-v4-flash:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
},
{
"id": "opencode/deepseek-v4-flash",
"provider": "opencode_go",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
},
{
"id": "deepseek-v4-pro:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high", "max"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
}
],
"aliases": {
"claude-3-5-haiku": "deepseek-v4-flash:cloud",
"claude-3-5-haiku-20241022": "deepseek-v4-flash:cloud",
"claude-3-haiku-20240307": "deepseek-v4-flash:cloud",
"claude-haiku-3-5-20241022": "deepseek-v4-flash:cloud"
}
}| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/messages |
x-api-key header |
Anthropic Messages API |
POST |
/v1/chat/completions |
Authorization: Bearer <key> |
OpenAI Chat Completions API |
POST |
/v1/responses |
Authorization: Bearer <key> |
OpenAI Responses API |
GET |
/v1/models |
Authorization: Bearer <key> |
List available models |
GET |
/health |
None | Health check |
GET |
/api/model-info |
None | Look up model details from models.dev (admin UI) |
GET |
/admin/model-info |
None | Look up model details from models.dev (admin server only) |
GET |
/admin/model-search |
None | Search models on models.dev (admin server only) |
POST |
/v1/messages/count_tokens |
x-api-key header |
Returns 404 (not supported upstream) |
Prism handles the full translation surface between all API formats:
Anthropic ↔ Ollama
Request mapping:
| Anthropic | Ollama | Notes |
|---|---|---|
messages |
messages |
Content blocks → string or array |
system |
messages[].role=system |
Injected as first message |
max_tokens |
options.num_predict |
|
temperature / top_p / top_k |
options.* |
|
tools |
tools |
Schema translation |
thinking |
think |
|
stop_sequences |
options.stop |
|
images (base64) |
images |
Image content blocks → image array |
Response mapping:
| Ollama | Anthropic | Notes |
|---|---|---|
message.content |
content[0].text |
Wrapped in content block array |
message.tool_calls |
content[].tool_use |
|
message.thinking |
content[].thinking |
|
done_reason: stop |
stop_reason: end_turn |
|
done_reason: length |
stop_reason: max_tokens |
|
done_reason: tool_call |
stop_reason: tool_use |
Anthropic ↔ OpenAI
Request mapping:
| Anthropic | OpenAI | Notes |
|---|---|---|
messages |
messages |
Content blocks → OpenAI format |
system |
messages[].role=system |
|
max_tokens |
max_tokens |
|
tools |
tools |
Schema translation |
thinking |
reasoning_content |
|
images (base64) |
image_url (data URI) |
Image content blocks → OpenAI image parts |
Response mapping:
| OpenAI | Anthropic | Notes |
|---|---|---|
choices[0].message.content |
content[0].text |
|
choices[0].message.tool_calls |
content[].tool_use |
|
choices[0].message.reasoning_content |
content[].thinking |
|
finish_reason: stop |
stop_reason: end_turn |
|
finish_reason: length |
stop_reason: max_tokens |
|
finish_reason: tool_calls |
stop_reason: tool_use |
OpenAI inbound → Ollama
When an OpenAI client talks to Prism with an Ollama upstream, Prism translates the full OpenAI Chat Completions request/response format to/from Ollama native format — including streaming, tool calls, reasoning content, and images.
| OpenAI | Ollama | Notes |
|---|---|---|
reasoning_effort |
think |
Any non-"off" value enables thinking |
image_url (data URI) |
images |
Base64 data extracted from data URI |
response_format |
— | Passed through when supported |
OpenAI inbound → OpenAI (pass-through)
When both the client and upstream speak OpenAI format, Prism applies model remapping and forwards the request with minimal modification. Streaming is passed through as-is.
Responses API ↔ Ollama / OpenAI
Prism translates the OpenAI Responses API (/v1/responses) to the upstream format, whether Ollama or OpenAI:
| Responses API | Chat Completions / Ollama | Notes |
|---|---|---|
input (string) |
messages[].role=user |
Simple string input → user message |
input (array of items) |
messages[] |
message, function_call, function_call_output, custom_tool_call_output items mapped |
instructions |
messages[].role=system |
System prompt |
tools (function type) |
tools |
type: function tools forwarded as-is |
tools (built-in type) |
tools |
Codex built-in tools (apply_patch, local_shell, web_search, etc.) rewrapped as function tools with preserved type mapping |
reasoning |
reasoning_effort / think |
Reasoning config → thinking mode |
text.format |
response_format / format |
Structured output / JSON schema |
max_output_tokens |
max_tokens / options.num_predict |
|
temperature / top_p |
temperature / top_p |
Response mapping (OpenAI upstream → Responses API):
| Chat Completions | Responses API | Notes |
|---|---|---|
message.content |
output[].message.content[].output_text |
Text content → output parts |
message.reasoning_content |
output[].reasoning |
Reasoning → reasoning item |
message.tool_calls |
output[].function_call or output[].custom_tool_call |
Tool calls mapped back to correct output type |
finish_reason: stop |
status: completed |
|
finish_reason: length |
status: incomplete |
Built-in tool type preservation: Prism maps Codex built-in tools (apply_patch, local_shell, web_search, computer_use) to function tools for the upstream model, preserving the original type so responses translate back correctly — e.g., apply_patch → custom_tool_call with input field instead of arguments.
Streaming: Full Responses API streaming event sequence is emitted — response.created, response.output_item.added, response.output_text.delta, response.output_text.done, response.content_part.added/done, response.output_item.done, response.function_call_arguments.delta/done, response.custom_tool_call_input.done, and response.completed.
Codex OAuth direct passthrough
For Codex (OpenAI) OAuth accounts, Prism routes Responses API requests directly to chatgpt.com/backend-api/codex/responses — bypassing api.openai.com entirely. This avoids Cloudflare restrictions on bearer tokens and provides native Codex tool support.
For Chat Completions clients (like Factory Droid, OpenCode) that need to reach Codex, Prism translates Chat Completions → Responses API format before forwarding to the Codex backend, so any agent can use your OAuth account regardless of its protocol.
Prism automatically extracts the chatgpt-account-id from your JWT token and includes it as a header, so no manual configuration is needed.
All routing paths support real-time SSE streaming with correct event translation:
| Inbound | Upstream | Streaming |
|---|---|---|
| Anthropic | Ollama | ✅ Newline-delimited JSON → Anthropic SSE |
| Anthropic | OpenAI | ✅ OpenAI SSE → Anthropic SSE |
| OpenAI Chat | Ollama | ✅ Newline-delimited JSON → OpenAI SSE |
| OpenAI Chat | OpenAI | ✅ Pass-through with model remapping |
| OpenAI Chat | Codex (OAuth) | ✅ Chat Completions → Codex Responses SSE |
| OpenAI Responses | Ollama | ✅ Newline-delimited JSON → Responses API SSE events |
| OpenAI Responses | OpenAI | ✅ OpenAI SSE → Responses API SSE events |
| OpenAI Responses | Codex (OAuth) | ✅ Direct passthrough to Codex backend |
Thinking/reasoning blocks, tool calls, and images are fully supported in all streaming paths.
Prism can start automatically when you log in. Toggle this from the admin UI (Proxy tab → Start at Login).
Windows: Uses the Windows Registry (HKCU\Software\Microsoft\Windows\CurrentVersion\Run) to launch the Prism executable at login. No admin rights required.
macOS: Uses a LaunchAgent plist (~/Library/LaunchAgents/com.prism.plist) to launch Prism at login.
The following features are not supported by upstream providers and are handled gracefully:
- Anthropic:
count_tokens,tool_choice,metadata, prompt caching, batches, PDF, URL images - OpenAI Chat inbound:
/v1/modelsreturns a static list from config (not proxied),parallel_tool_calls,logprobs,seed,user - OpenAI Responses inbound:
previous_response_id(conversation continuity),store, built-in tools (web search, file search, code interpreter) are filtered out for Ollama upstreams
Windows:
go-winres make; go build -ldflags="-H windowsgui" -o prism.exe .The -H windowsgui flag hides the console window and enables system tray integration.
To run in console mode (for debugging), build without the flag:
go build -o prism.exe .
./prism.exe --servemacOS:
CGO_ENABLED=1 go build -ldflags="-X main.version=dev" -o prism .
# Or use the build script to create a signed .app bundle and DMG:
./scripts/build-darwin.shWindows (PowerShell):
# 1. Start Prism
./prism.exe
# 2. Test Anthropic endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/messages" -Method POST `
-ContentType "application/json" `
-Headers @{"x-api-key"="prism"} `
-Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 3. Test OpenAI Chat Completions endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/chat/completions" -Method POST `
-ContentType "application/json" `
-Headers @{"Authorization"="Bearer prism"} `
-Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 4. Test OpenAI Responses API endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/responses" -Method POST `
-ContentType "application/json" `
-Headers @{"Authorization"="Bearer prism"} `
-Body '{"model":"glm-5.1:cloud","input":"hi"}'
# 5. Test model listing
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/models" -Headers @{"Authorization"="Bearer prism"}
# 6. Test admin UI
Invoke-RestMethod -Uri "http://127.0.0.1:8765/admin/status"macOS / Linux (bash):
# 1. Start Prism
open Prism.app
# 2. Test Anthropic endpoint
curl -s http://127.0.0.1:11434/v1/messages \
-H "x-api-key: prism" \
-H "Content-Type: application/json" \
-d '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 3. Test OpenAI Chat Completions endpoint
curl -s http://127.0.0.1:11434/v1/chat/completions \
-H "Authorization: Bearer prism" \
-H "Content-Type: application/json" \
-d '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 4. Test OpenAI Responses API endpoint
curl -s http://127.0.0.1:11434/v1/responses \
-H "Authorization: Bearer prism" \
-H "Content-Type: application/json" \
-d '{"model":"glm-5.1:cloud","input":"hi"}'
# 5. Test model listing
curl -s http://127.0.0.1:11434/v1/models -H "Authorization: Bearer prism"
# 6. Test admin UI
curl -s http://127.0.0.1:8765/admin/statusThe Codex Desktop & CLI integration (native model selector support) was inspired by and reverse-engineered from codex-shim by Sybil Solutions. Their work on the Responses API translation layer, custom_model_catalog.json format, config.toml managed blocks, and the ASAR patch for the model picker provided the blueprint for Prism's Codex support. If you need a standalone Python-based shim with additional features (Cursor passthrough, ChatGPT passthrough, auto-router, web picker UI), check out their project.
Prism — connect any agent. route any model. stream any protocol.
