Skip to content

KavinMK05/Prism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prism

Prism

One proxy. Every agent. Every protocol. A 5 MB binary with zero dependencies. Runs on Windows & macOS.

Prism is a universal LLM proxy that connects any AI agent — Claude Code, Codex Desktop, Factory Droid, OpenCode, ZCode, Cursor, and more — to any provider through any protocol. It translates between Anthropic Messages, OpenAI Chat Completions, OpenAI Responses, and Ollama APIs in real time, with built-in one-click integrations that auto-configure your agents. Native system tray, web admin UI, model auto-discovery, and full SSE streaming. Zero config.

Windows macOS Go License

Prism screenshot


Why Prism?

Every AI agent speaks a different protocol. Every provider expects a different format. Every model has different capabilities. Prism sits in between, translating requests and responses on the fly — and with one-click integrations, it auto-configures your agents so they just work.

One proxy. Every agent. Every protocol. No Python.

Prism LiteLLM
Binary size ~5 MB ~200 MB (Python + deps)
Memory ~5–10 MB ~200–500 MB
Startup < 100 ms ~2–5 s
Runtime deps None Python 3.9+, pip packages
Anthropic API
OpenAI Chat API
OpenAI Responses API
Ollama Native API
Streaming (SSE)
Model remapping
Tool calling
Thinking/reasoning ⚠️ partial
Per-model reasoning toggle
Reasoning effort validation
Image support
Structured outputs ⚠️ partial
Per-model capabilities ✅ Tools / Vision / Struct
Auto model config (models.dev) ✅ Zero-config
Provider-per-model routing
Claude Code integration ✅ One-click
Codex Desktop & CLI integration ✅ One-click
Factory Droid integration ✅ One-click
OpenCode integration ✅ One-click
ZCode integration ✅ One-click
Web admin UI
Windows native ✅ System tray + admin UI ❌ Requires Python
macOS native ✅ System tray + admin UI ❌ Requires Python

How it works

  Your agents                              Cloud providers
  ─────────                               ────────────────

  Claude Code ─────┐
  (Anthropic API)  │                       ┌──────────────┐
                   │    ┌───────────┐       │  Ollama Cloud │
  Codex Desktop ───┼───→│   Prism   │──────→│  /api/chat    │
  (Responses API)  │    │  :11434   │       └──────────────┘
                   │    └───────────┘       ┌──────────────┐
  Factory Droid ───┤         │              │  OpenCode Go  │
  (Chat Completions)│         │              │  /v1/chat/... │
                   │         │              └──────────────┘
  OpenCode ────────┤         │              ┌──────────────┐
  (Chat Completions)│         ├──────────────→│  Custom       │
                   │         │              │  /v1/chat/... │
  ZCode ───────────┤         │              └──────────────┘
  (Chat Completions)│         │
                   │         │              ┌────────────────────┐
  Cursor ──────────┤         └──────────────→│  Codex (via OAuth) │
  (OpenAI API)     │                        │  /backend-api/...  │
                   │                        └────────────────────┘
  Continue ────────┤
  (OpenAI API)     │              ┌────────────────────┐
                   │              │  Codex (via OAuth) │── Sign in with OpenAI
  OpenAI SDK ──────┘              │  /backend-api/...  │   account — no API key
  (Responses API)                 └────────────────────┘   needed

                            ┌────────────────────┐
                            │  Admin UI          │
                            │  :8765             │
                            └────────────────────┘

Prism accepts requests in four protocol formatsAnthropic Messages (/v1/messages), OpenAI Chat Completions (/v1/chat/completions), OpenAI Responses (/v1/responses), and Ollama Native (/api/chat) — translates them to whatever your upstream provider speaks, and translates responses back. For Codex (OpenAI) accounts, Prism routes directly to the ChatGPT backend API, including Chat Completions ↔ Responses API translation so any agent can use your OAuth account regardless of its protocol. Streaming works seamlessly in all directions.

For integrated agents, Prism writes the right config files automatically — environment variables, provider blocks, model catalogs — so your agents see Prism's models without any manual setup.

Quick start

1. Run Prism

Windows:

./prism.exe

macOS:

open Prism.app
# or from the DMG: drag Prism.app to /Applications, then open it

That's it. Prism starts on http://127.0.0.1:11434 and a system tray icon appears. A web admin UI is available at http://127.0.0.1:8765/admin.

2. Configure your provider

Open the admin UI from the system tray (right-click → Open Settings) or navigate to http://127.0.0.1:8765/admin. In the Provider tab:

  1. Select your upstream provider (Ollama Cloud, OpenCode Go, a custom provider, or a Codex OAuth account)
  2. For API-key providers, enter your API key
  3. For Codex, click Add Codex Account to sign in with your OpenAI account
  4. Prism auto-restarts with the new config

You can also configure via the config file — %APPDATA%\prism\config.json on Windows or ~/Library/Application Support/prism/config.json on macOS — see Providers below.

3. Add your models

In the Models tab, just type a model name and Prism auto-fetches all the details — context length, max output tokens, reasoning support, tool calling, vision, structured outputs, and reasoning effort levels — from models.dev. No manual configuration needed. Select your provider, search for the model, and click to auto-fill everything.

4. Connect your agents

Go to the Agents tab in the admin UI. Prism auto-detects which agents are installed and shows their status. Click Setup next to any agent to configure it with one click. See Agent integrations for details on each agent.

Setting up with Claude Desktop

Edit your Claude Desktop config:

{
  "inferenceProvider": "gateway",
  "inferenceGatewayBaseUrl": "http://127.0.0.1:11434",
  "inferenceGatewayApiKey": "prism",
  "inferenceModels": [
    { "name": "glm-5.1:cloud" },
    { "name": "deepseek-v4-pro:cloud", "supports1m": true }
  ]
}
Setting up with Claude Code

Prism integrates with Claude Code by setting environment variables and per-tier model mappings in ~/.claude/settings.json. You can choose which Prism model fills each tier (opus, sonnet, haiku, subagent).

One-click setup: Go to the Agents tab in the admin UI, select your tier models, and click Setup. Prism backs up your existing config and writes the right environment variables.

Manual setup: Edit ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://127.0.0.1:11434",
    "ANTHROPIC_AUTH_TOKEN": "prism",
    "ANTHROPIC_API_KEY": ""
  }
}
Setting up with Codex Desktop & CLI

Prism integrates with Codex Desktop and Codex CLI's native model selector. When enabled, all your Prism models appear directly in the model picker — no need to manually configure each model.

One-click setup: Go to the Agents tab in the admin UI and click Setup under "Codex Desktop Integration".

How it works: Prism writes a managed provider block to ~/.codex/config.toml and generates a model catalog JSON file. Codex Desktop/CLI reads these on launch and populates its model picker with your Prism models. Requests flow through Prism's Responses API endpoint, which translates them to your configured upstream provider.

Automatic sync: Prism auto-syncs the catalog on every startup if Codex Desktop/CLI is detected (~/.codex/config.toml exists), so new models added to your remapping are picked up automatically.

To disable: Click Restore in the Agents tab. This removes Prism's managed blocks and restores any previous settings.

Setting up with Factory Droid

Prism adds your models as [Prism] custom entries in ~/.factory/settings.json. Codex OAuth models are routed through /v1/responses, all others through /v1/chat/completions.

One-click setup: Go to the Agents tab in the admin UI and click Setup under "Factory Droid". Prism backs up your existing config and injects all your Prism models.

To disable: Click Restore to remove all Prism-tagged entries.

Setting up with OpenCode

Prism registers two providers in ~/.config/opencode/opencode.json: prism (for non-Codex models via /v1/chat/completions) and prism-codex (for Codex OAuth models via /v1/responses). The first available Prism model is set as the default.

One-click setup: Go to the Agents tab in the admin UI and click Setup under "OpenCode". Prism backs up your existing config and writes the provider blocks.

To disable: Click Restore to remove the Prism providers and default model references.

Setting up with ZCode

Prism writes a provider block to ~/.zcode/v2/config.json with your Prism base URL and API key. Each model entry includes context/output limits, modalities, and optional reasoning configuration.

One-click setup: Go to the Agents tab in the admin UI and click Setup under "ZCode". Prism backs up your existing config and writes the provider block.

To disable: Click Restore to remove the Prism provider.

Setting up with Cursor / Continue / other OpenAI clients

Point your client to http://127.0.0.1:11434/v1 with any API key. Prism accepts OpenAI Chat Completions requests and translates them to the configured upstream provider.

Setting up with OpenAI SDK (Responses API)

Set the base URL to http://127.0.0.1:11434/v1. Prism accepts OpenAI Responses API requests at /v1/responses and translates them to the configured upstream provider — including streaming, tool calls, and reasoning.

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:11434/v1",
    api_key="prism"
)

response = client.responses.create(
    model="glm-5.1:cloud",
    input="Hello!",
    stream=True
)

Agent integrations

Prism includes built-in, one-click integrations for popular AI coding agents. Each integration auto-detects whether the agent is installed, writes the right config files, and keeps them in sync when your models change.

Agent What it does Config location
Claude Code Sets ANTHROPIC_BASE_URL + per-tier model mappings ~/.claude/settings.json
Codex Desktop & CLI Injects models into native model picker ~/.codex/config.toml + catalog JSON
Factory Droid Adds [Prism] custom models with smart routing ~/.factory/settings.json
OpenCode Registers prism + prism-codex providers ~/.config/opencode/opencode.json
ZCode Registers prism provider with model list ~/.zcode/v2/config.json

How it works:

  1. Auto-detection — Prism checks if each agent's config file or binary exists on disk. The Agents tab shows which agents are installed and active.
  2. One-click setup — Click Setup to back up the agent's existing config and write Prism's configuration. Click Restore to revert to the backup.
  3. Auto-sync — Prism syncs all agent configs on startup and whenever you add or remove models, so newly added models appear automatically.
  4. Smart routing — Codex OAuth models are routed through /v1/responses, all others through /v1/chat/completions. Each agent gets the right endpoint for its protocol.

System tray

When launched without arguments, Prism runs as a system tray application with these options:

Menu item Action
Start / Stop / Restart Proxy Control the proxy server process
Open Settings Open the web admin UI in your browser
Open Folder Open the proxy directory in Explorer / Finder
Edit Model Config Open model_remapping.json in Notepad / TextEdit
Show Logs Open a live log viewer console
Check for Updates Check for newer versions of Prism
Quit Stop proxy and exit

Admin Web UI

Prism includes a built-in web admin interface for managing everything without editing config files by hand.

URL: http://127.0.0.1:8765/admin (configurable via PRISM_ADMIN_PORT)

The admin UI provides:

Tab Features
Provider Select default provider, set API keys, add/edit/remove custom providers
OAuth Manage Codex (OpenAI) accounts — sign in, view session/weekly usage percentages, activate, or remove accounts
Models Edit model remapping — default model, known models with per-model provider, reasoning toggle, capabilities (tools/vision/struct), context length, max output tokens, reasoning effort levels, and aliases. Auto-fill from models.dev — type a model name, search, and click to populate all fields automatically.
Agents One-click setup/restore for Claude Code, Codex Desktop, Factory Droid, OpenCode, and ZCode. Claude Code includes per-tier model selectors (opus, sonnet, haiku, subagent).
Stats Live and historical performance dashboard (see below)
Proxy Start, stop, and restart the proxy; view status; toggle auto-start at login
Logs Live tail of the last 200 log lines

Changes are saved immediately and the proxy auto-restarts when needed.

Stats Dashboard

The Stats tab surfaces every metric about your proxy usage:

Section What it shows
Filter bar Filter by provider, model, client origin, or date range; refresh button to reload all data
Tokens Per Day Stacked bar chart (input + output) with a total headline — persists across restarts via SQLite
Tokens Per Month Filled line chart showing monthly aggregate totals
Live TPS Real-time tokens/sec hero value with a live sparkline chart (120-point rolling window, updated every second)
Session Totals Running counts: total requests, input tokens, output tokens, and average TPS
Client Breakdown Per-client usage stats showing requests, total tokens, and a distribution pie chart — identifies tools like Claude Code, Cursor, Continue, Copilot, Factory Droid, and more automatically by User-Agent
TPS History Table (model, provider, avg/max TPS) paired with a multi-line chart of 5-minute bucket averages over time
By Model Per-model breakdown of requests, token counts, and average TPS
Recent Requests Timestamped log of the last 50 requests with model, client, token counts, TPS, and duration
Data Management One-click Clear All Stats button to wipe all persisted history

All request data and TPS snapshots are persisted to the stats database (%APPDATA%\prism\stats.db on Windows, ~/Library/Application Support/prism/stats.db on macOS — SQLite, WAL mode) so the dashboard survives proxy restarts and page refreshes. Charts are rendered with Chart.js and automatically adapt to light/dark theme.

Client detection

Prism automatically identifies which tool is making each request by inspecting the User-Agent header. Detected clients include:

Claude Code, Cursor, Continue, GitHub Copilot, Aider, OpenCode, Windsurf, Trae, Factory Droid, Supermaven, and Claude Desktop.

You can override detection by setting the X-Client-Name header on your requests — the value is used directly in stats, so you can tag requests with custom names like "my-script" or "ci-pipeline".

Environment variables

Variable Default Description
PRISM_PORT 11434 Port for the proxy server
PRISM_HOST 127.0.0.1 Host to bind (use 0.0.0.0 for network access)
PRISM_ADMIN_PORT 8765 Port for the admin web UI
OLLAMA_API_KEY API key for Ollama Cloud (fallback if not in config)
OPENCODE_GO_API_KEY API key for OpenCode Go (fallback if not in config)

Providers

Prism supports multiple upstream providers, configured via the admin UI or the config file (%APPDATA%\prism\config.json on Windows, ~/Library/Application Support/prism/config.json on macOS):

Provider Config key Upstream format Endpoint
Ollama Cloud ollama_cloud Ollama Native /api/chat
OpenCode Go opencode_go OpenAI /v1/chat/completions
Custom providers custom_providers[] OpenAI /v1/chat/completions
Codex (via OAuth) oauth_accounts[] OpenAI chatgpt.com/backend-api/codex/responses

Provider-per-model routing

Each model in your remapping is assigned to a specific provider. When a request arrives, Prism resolves the model, looks up its assigned provider, and routes the request to that upstream — even if other models go to different providers. This means you can mix models from Ollama Cloud, OpenCode Go, custom providers, and OAuth accounts in a single session.

  • The default_provider field in config is used only as a fallback when a model has no explicit provider assignment.
  • Provider routing is handled by the ProviderRouter, which resolves the provider per-request based on the requested model name.
  • Models from different providers can coexist — set each model's provider when adding it to Known Models.

Custom providers

You can add multiple custom providers (e.g. OpenRouter, Groq, Together AI) — each with its own name, base URL, and API key. Add, edit, or delete them from the admin UI Provider tab. Custom providers are assigned unique IDs like custom_myprovider_abc123.

Codex OAuth accounts

Prism supports signing in with your OpenAI account via OAuth (no API key needed). Click Add Codex Account in the admin UI OAuth tab or system tray, and your browser will open for authentication. Once connected, Prism uses your account token automatically, including token refresh and usage tracking.

Codex requests route directly to chatgpt.com/backend-api/codex/responses — not through api.openai.com — which avoids Cloudflare restrictions on bearer tokens. Prism also extracts the chatgpt-account-id from your JWT token automatically, so you don't need to configure it manually.

Switch providers from the system tray, admin UI, or by changing the default_provider field — no restart required when using the tray/UI.

Full config example
{
  "default_provider": "ollama_cloud",
  "ollama_cloud": {
    "id": "ollama_cloud",
    "name": "Ollama Cloud",
    "base_url": "https://ollama.com",
    "api_key": ""
  },
  "opencode_go": {
    "id": "opencode_go",
    "name": "OpenCode Go",
    "base_url": "https://opencode.ai/zen/go",
    "api_key": ""
  },
  "custom_providers": [
    {
      "id": "custom_openrouter_abc123",
      "name": "OpenRouter",
      "base_url": "https://openrouter.ai/api/v1",
      "api_key": ""
    }
  ],
  "oauth_accounts": [
    {
      "id": "codex_user_abc123",
      "provider": "codex",
      "label": "Codex",
      "email": "user@example.com",
      "access_token": "...",
      "refresh_token": "...",
      "expires_at": 1234567890,
      "plan_tier": "plus",
      "active": true
    }
  ],
  "agent_integrations": {
    "claude_code_tiers": {
      "opus": "deepseek-v4-pro:cloud",
      "sonnet": "deepseek-v4-flash:cloud",
      "haiku": "glm-5.1:cloud",
      "subagent": "deepseek-v4-flash:cloud"
    }
  }
}

API keys in the config file take priority. If empty, Prism falls back to these environment variables:

Variable Used for
OLLAMA_API_KEY Ollama Cloud
OPENCODE_GO_API_KEY OpenCode Go

Model remapping

Prism can remap model names on the fly — useful when clients send model names that don't exist on your upstream provider.

Configured via the admin UI (Models tab) or the model remapping file (%APPDATA%\prism\model_remapping.json on Windows, ~/Library/Application Support/prism/model_remapping.json on macOS).

Auto model configuration

Instead of manually filling in context lengths, token limits, and capabilities, just type a model name in the Models tab and click Search. Prism queries models.dev and auto-fills:

  • Context length
  • Max output tokens
  • Reasoning toggle and allowed effort levels
  • Tool calling, structured output, and vision capabilities

The search is scoped to your selected provider so you get accurate results. No manual configuration needed — search, select, and you're done.

Default model

When an unknown model is requested, Prism falls back to this model. Select it from the dropdown in the admin UI or set default_model.

Known models

Known models are rich entries — not just strings — with per-model provider assignment, reasoning toggle, capabilities, and token limits. Each entry includes:

Field Type Description
id string Model identifier (e.g. deepseek-v4-flash:cloud)
provider string Provider to route this model to (e.g. ollama_cloud, opencode_go, a custom provider ID, or an OAuth account ID)
reasoning bool Whether this model supports thinking/reasoning
reasoning_effort string[] Allowed reasoning effort levels (low, medium, high, max)
context_length int Maximum context window in tokens
max_output_tokens int Maximum output tokens
capabilities.tool_calling bool Supports tool/function calling
capabilities.structured_outputs bool Supports structured/JSON output
capabilities.vision bool Supports image input

Models matching a known entry pass through without remapping. A model that doesn't match any entry falls back to the default model.

Reasoning toggle & effort validation

Prism validates reasoning_effort against each model's capabilities:

  • Non-reasoning models: reasoning_effort is automatically stripped from requests.
  • Reasoning models: Invalid effort values are normalized to the model's first allowed effort (e.g. "invalid""medium"), with a warning logged.
  • Unknown models: reasoning_effort is stripped for safety.
  • Responses API normalization: enabled / on / truemedium; disabled / off / false / none → omitted.
  • Anthropic → OpenAI translation: Anthropic thinking is mapped to reasoning_effort=medium.

Model aliases

Map incoming model names to different upstream models.

Feature What it does
Aliases Map model names (e.g. claude-3-5-haikudeepseek-v4-flash:cloud)
Default model Fallback when a requested model isn't recognized
Known models Rich entries with per-model provider, reasoning, and capabilities
Auto config Search models.dev and auto-fill all fields
Full remapping example
{
  "default_model": "glm-5.1:cloud",
  "known_models": [
    {
      "id": "glm-5.1:cloud",
      "provider": "ollama_cloud",
      "reasoning": true,
      "reasoning_effort": ["low", "medium", "high"],
      "context_length": 128000,
      "max_output_tokens": 16384,
      "capabilities": {
        "tool_calling": true,
        "structured_outputs": true,
        "vision": true
      }
    },
    {
      "id": "deepseek-v4-flash:cloud",
      "provider": "ollama_cloud",
      "reasoning": true,
      "reasoning_effort": ["low", "medium", "high"],
      "context_length": 128000,
      "max_output_tokens": 16384,
      "capabilities": {
        "tool_calling": true,
        "structured_outputs": true
      }
    },
    {
      "id": "opencode/deepseek-v4-flash",
      "provider": "opencode_go",
      "reasoning": true,
      "reasoning_effort": ["low", "medium", "high"],
      "context_length": 128000,
      "max_output_tokens": 16384,
      "capabilities": {
        "tool_calling": true,
        "structured_outputs": true
      }
    },
    {
      "id": "deepseek-v4-pro:cloud",
      "provider": "ollama_cloud",
      "reasoning": true,
      "reasoning_effort": ["low", "medium", "high", "max"],
      "context_length": 128000,
      "max_output_tokens": 16384,
      "capabilities": {
        "tool_calling": true,
        "structured_outputs": true
      }
    }
  ],
  "aliases": {
    "claude-3-5-haiku": "deepseek-v4-flash:cloud",
    "claude-3-5-haiku-20241022": "deepseek-v4-flash:cloud",
    "claude-3-haiku-20240307": "deepseek-v4-flash:cloud",
    "claude-haiku-3-5-20241022": "deepseek-v4-flash:cloud"
  }
}

API endpoints

Method Path Auth Description
POST /v1/messages x-api-key header Anthropic Messages API
POST /v1/chat/completions Authorization: Bearer <key> OpenAI Chat Completions API
POST /v1/responses Authorization: Bearer <key> OpenAI Responses API
GET /v1/models Authorization: Bearer <key> List available models
GET /health None Health check
GET /api/model-info None Look up model details from models.dev (admin UI)
GET /admin/model-info None Look up model details from models.dev (admin server only)
GET /admin/model-search None Search models on models.dev (admin server only)
POST /v1/messages/count_tokens x-api-key header Returns 404 (not supported upstream)

Translation support

Prism handles the full translation surface between all API formats:

Anthropic ↔ Ollama

Request mapping:

Anthropic Ollama Notes
messages messages Content blocks → string or array
system messages[].role=system Injected as first message
max_tokens options.num_predict
temperature / top_p / top_k options.*
tools tools Schema translation
thinking think
stop_sequences options.stop
images (base64) images Image content blocks → image array

Response mapping:

Ollama Anthropic Notes
message.content content[0].text Wrapped in content block array
message.tool_calls content[].tool_use
message.thinking content[].thinking
done_reason: stop stop_reason: end_turn
done_reason: length stop_reason: max_tokens
done_reason: tool_call stop_reason: tool_use
Anthropic ↔ OpenAI

Request mapping:

Anthropic OpenAI Notes
messages messages Content blocks → OpenAI format
system messages[].role=system
max_tokens max_tokens
tools tools Schema translation
thinking reasoning_content
images (base64) image_url (data URI) Image content blocks → OpenAI image parts

Response mapping:

OpenAI Anthropic Notes
choices[0].message.content content[0].text
choices[0].message.tool_calls content[].tool_use
choices[0].message.reasoning_content content[].thinking
finish_reason: stop stop_reason: end_turn
finish_reason: length stop_reason: max_tokens
finish_reason: tool_calls stop_reason: tool_use
OpenAI inbound → Ollama

When an OpenAI client talks to Prism with an Ollama upstream, Prism translates the full OpenAI Chat Completions request/response format to/from Ollama native format — including streaming, tool calls, reasoning content, and images.

OpenAI Ollama Notes
reasoning_effort think Any non-"off" value enables thinking
image_url (data URI) images Base64 data extracted from data URI
response_format Passed through when supported
OpenAI inbound → OpenAI (pass-through)

When both the client and upstream speak OpenAI format, Prism applies model remapping and forwards the request with minimal modification. Streaming is passed through as-is.

Responses API ↔ Ollama / OpenAI

Prism translates the OpenAI Responses API (/v1/responses) to the upstream format, whether Ollama or OpenAI:

Responses API Chat Completions / Ollama Notes
input (string) messages[].role=user Simple string input → user message
input (array of items) messages[] message, function_call, function_call_output, custom_tool_call_output items mapped
instructions messages[].role=system System prompt
tools (function type) tools type: function tools forwarded as-is
tools (built-in type) tools Codex built-in tools (apply_patch, local_shell, web_search, etc.) rewrapped as function tools with preserved type mapping
reasoning reasoning_effort / think Reasoning config → thinking mode
text.format response_format / format Structured output / JSON schema
max_output_tokens max_tokens / options.num_predict
temperature / top_p temperature / top_p

Response mapping (OpenAI upstream → Responses API):

Chat Completions Responses API Notes
message.content output[].message.content[].output_text Text content → output parts
message.reasoning_content output[].reasoning Reasoning → reasoning item
message.tool_calls output[].function_call or output[].custom_tool_call Tool calls mapped back to correct output type
finish_reason: stop status: completed
finish_reason: length status: incomplete

Built-in tool type preservation: Prism maps Codex built-in tools (apply_patch, local_shell, web_search, computer_use) to function tools for the upstream model, preserving the original type so responses translate back correctly — e.g., apply_patchcustom_tool_call with input field instead of arguments.

Streaming: Full Responses API streaming event sequence is emitted — response.created, response.output_item.added, response.output_text.delta, response.output_text.done, response.content_part.added/done, response.output_item.done, response.function_call_arguments.delta/done, response.custom_tool_call_input.done, and response.completed.

Codex OAuth direct passthrough

For Codex (OpenAI) OAuth accounts, Prism routes Responses API requests directly to chatgpt.com/backend-api/codex/responses — bypassing api.openai.com entirely. This avoids Cloudflare restrictions on bearer tokens and provides native Codex tool support.

For Chat Completions clients (like Factory Droid, OpenCode) that need to reach Codex, Prism translates Chat Completions → Responses API format before forwarding to the Codex backend, so any agent can use your OAuth account regardless of its protocol.

Prism automatically extracts the chatgpt-account-id from your JWT token and includes it as a header, so no manual configuration is needed.

Streaming

All routing paths support real-time SSE streaming with correct event translation:

Inbound Upstream Streaming
Anthropic Ollama ✅ Newline-delimited JSON → Anthropic SSE
Anthropic OpenAI ✅ OpenAI SSE → Anthropic SSE
OpenAI Chat Ollama ✅ Newline-delimited JSON → OpenAI SSE
OpenAI Chat OpenAI ✅ Pass-through with model remapping
OpenAI Chat Codex (OAuth) ✅ Chat Completions → Codex Responses SSE
OpenAI Responses Ollama ✅ Newline-delimited JSON → Responses API SSE events
OpenAI Responses OpenAI ✅ OpenAI SSE → Responses API SSE events
OpenAI Responses Codex (OAuth) ✅ Direct passthrough to Codex backend

Thinking/reasoning blocks, tool calls, and images are fully supported in all streaming paths.

Auto-start at Login

Prism can start automatically when you log in. Toggle this from the admin UI (Proxy tab → Start at Login).

Windows: Uses the Windows Registry (HKCU\Software\Microsoft\Windows\CurrentVersion\Run) to launch the Prism executable at login. No admin rights required.

macOS: Uses a LaunchAgent plist (~/Library/LaunchAgents/com.prism.plist) to launch Prism at login.

Limitations

The following features are not supported by upstream providers and are handled gracefully:

  • Anthropic: count_tokens, tool_choice, metadata, prompt caching, batches, PDF, URL images
  • OpenAI Chat inbound: /v1/models returns a static list from config (not proxied), parallel_tool_calls, logprobs, seed, user
  • OpenAI Responses inbound: previous_response_id (conversation continuity), store, built-in tools (web search, file search, code interpreter) are filtered out for Ollama upstreams

Building from source

Windows:

go-winres make; go build -ldflags="-H windowsgui" -o prism.exe .

The -H windowsgui flag hides the console window and enables system tray integration.

To run in console mode (for debugging), build without the flag:

go build -o prism.exe .
./prism.exe --serve

macOS:

CGO_ENABLED=1 go build -ldflags="-X main.version=dev" -o prism .
# Or use the build script to create a signed .app bundle and DMG:
./scripts/build-darwin.sh

Verification

Windows (PowerShell):

# 1. Start Prism
./prism.exe

# 2. Test Anthropic endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/messages" -Method POST `
  -ContentType "application/json" `
  -Headers @{"x-api-key"="prism"} `
  -Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

# 3. Test OpenAI Chat Completions endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/chat/completions" -Method POST `
  -ContentType "application/json" `
  -Headers @{"Authorization"="Bearer prism"} `
  -Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

# 4. Test OpenAI Responses API endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/responses" -Method POST `
  -ContentType "application/json" `
  -Headers @{"Authorization"="Bearer prism"} `
  -Body '{"model":"glm-5.1:cloud","input":"hi"}'

# 5. Test model listing
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/models" -Headers @{"Authorization"="Bearer prism"}

# 6. Test admin UI
Invoke-RestMethod -Uri "http://127.0.0.1:8765/admin/status"

macOS / Linux (bash):

# 1. Start Prism
open Prism.app

# 2. Test Anthropic endpoint
curl -s http://127.0.0.1:11434/v1/messages \
  -H "x-api-key: prism" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

# 3. Test OpenAI Chat Completions endpoint
curl -s http://127.0.0.1:11434/v1/chat/completions \
  -H "Authorization: Bearer prism" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

# 4. Test OpenAI Responses API endpoint
curl -s http://127.0.0.1:11434/v1/responses \
  -H "Authorization: Bearer prism" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.1:cloud","input":"hi"}'

# 5. Test model listing
curl -s http://127.0.0.1:11434/v1/models -H "Authorization: Bearer prism"

# 6. Test admin UI
curl -s http://127.0.0.1:8765/admin/status

Acknowledgments

The Codex Desktop & CLI integration (native model selector support) was inspired by and reverse-engineered from codex-shim by Sybil Solutions. Their work on the Responses API translation layer, custom_model_catalog.json format, config.toml managed blocks, and the ASAR patch for the model picker provided the blueprint for Prism's Codex support. If you need a standalone Python-based shim with additional features (Cursor passthrough, ChatGPT passthrough, auto-router, web picker UI), check out their project.


Prism — connect any agent. route any model. stream any protocol.

About

A 5 MB LLM proxy for Windows and MacOS. Translates Anthropic, OpenAI, and Ollama APIs in real time. Native system tray, built-in admin UI, model remapping, and SSE streaming. Zero dependencies

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages