A high-performance, stateless OpenAI-compatible API gateway proxy for Google's official gemini-cli.
This proxy translates incoming /v1/chat/completions (streaming/non-streaming) and /v1/models REST payloads into isolated, headless gemini-cli subprocess executions and streams Server-Sent Events (SSE) directly back to any standard client (e.g., Open WebUI, VS Code Continue, Emacs gptel).
When translating an architecture from a JavaScript/Node.js ecosystem into a systems-programming language like Rust, we made fundamental structural shifts under the hood to maximize performance, safety, and simplicity.
| Metric / Aspect | Traditional Node.js Proxy | Our Rust Proxy Design |
|---|---|---|
| Integration Model | Downloads gemini-cli source as a Git submodule and imports internal code directly. Runs server and Gemini logic in the same JS memory bubble. |
Acts as an independent "process manager." Spawns your already-installed gemini-cli as an isolated background process, communicating via stdin/stdout pipes. |
| Footprint & Deployment | Requires Node.js, npm install (hundreds of MB of node_modules), and TS compilation. Idles at 40MB - 100MB RAM. |
Single, hyper-lean standalone binary. No node_modules required to run the server. Idles at 3MB - 5MB RAM with zero GC latency. |
| Protocol Translation | Bypasses translation by importing CLI code directly, calling Google's SDK directly inside JS. | Speaks OpenAI REST on the front end (to Open WebUI/Emacs) and cleanly maps those payloads into headless gemini prompt executions using stream-json line-by-line streams on the back end. |
In short, while a Node.js proxy is a heavily integrated script, this Rust version is a lightweight, universal, secure system daemon that wraps the CLI from the outside!
- Stateless & Parallel: Spawns a lightweight, headless subprocess per-request. Allows 100% concurrent execution with zero thread-locking or session contamination.
- Secure Sandbox Execution: Runs all subprocesses inside the system temporary directory (
temp_dir) to prevent the LLM from attempting local filesystem or codebase actions. - Workspace Trust Bypass: Automatically overrides headless directory trust checks (
--skip-trust). - Secure Network Binding: Defaults strictly to
127.0.0.1(localhost), with optional binding override via environment variables. - Real-time Performance Profiling: Logs process spawn duration, Time-to-First-Token (TTFT), and average generation speed (
tokens/second) directly to stdout for every query.
gemini(Official Google Gemini CLI binary installed in your system PATH).- Active GCP Authentication:
gcloud auth login gcloud auth application-default login
The proxy exposes the active Vertex AI endpoint matrix:
| Model ID | Target Endpoint | Description |
|---|---|---|
gemini-cli |
gemini-3-flash |
Default fast, general-purpose model |
gemini-3-flash |
gemini-3-flash |
Standard high-speed generation model |
gemini-3.1-flash-lite |
gemini-3.1-flash-lite |
Ultra-fast lightweight model |
gemini-1.5-pro |
gemini-1.5-pro |
Stable legacy high-reasoning Pro model |
gemini-2.5-pro |
gemini-2.5-pro |
Advanced high-reasoning Pro model |
gemini-3.1-pro-preview |
gemini-3.1-pro-preview |
Live cutting-edge high-reasoning Pro model |
cargo run --releaseServer boots instantly on http://127.0.0.1:8765.
If you need to connect from containerized applications (like Dockerized Open WebUI) without exposing the port to your physical LAN, bind the proxy to your private virtual Docker bridge gateway IP:
BIND_ADDRESS=172.17.0.1 PORT=8080 cargo run --releaseStart your Open WebUI container with OPENAI_API_BASE_URL pointing to your host gateway:
docker run -d -p 3000:8080 \
-e OPENAI_API_BASE_URL="http://host.docker.internal:8765/v1" \
-e OPENAI_API_KEY="local" \
--name open-webui \
ghcr.io/open-webui/open-webui:main- Open Open WebUI (
http://localhost:3000). - Go to Admin Settings -> Connections -> OpenAI API.
- Click the circular refresh arrow icon next to the URL/Key fields.
- Open WebUI will fetch the supported model list. Select your target Gemini model from the dropdown menu and start chatting.
If you use Emacs, you can configure gptel to use this local proxy as a custom OpenAI provider. Add the following Elisp configuration to your init file:
(use-package gptel
:config
(gptel-make-openai "GeminiProxy"
:host "127.0.0.1:8765"
:protocol "http"
:key "sk-dummy"
:stream t
:models '(gemini-cli
gemini-3-flash
gemini-3.1-flash-lite
gemini-1.5-pro
gemini-2.5-pro
gemini-3.1-pro-preview)))On macOS, the native and most robust way to run this proxy persistently in the background is through a user-level LaunchAgent using launchd. This ensures the proxy starts automatically whenever you log in.
Build and install the binary directly using cargo install (this compiles in release mode and puts the binary inside ~/.cargo/bin/):
cargo install --path . --forceCreate a configuration file at ~/Library/LaunchAgents/com.user.gemini-proxy.plist:
touch ~/Library/LaunchAgents/com.user.gemini-proxy.plistOpen this file and paste the following XML (replace YOUR_USERNAME with your actual macOS username, and verify your GOOGLE_CLOUD_PROJECT name):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.user.gemini-proxy</string>
<key>ProgramArguments</key>
<array>
<string>/Users/YOUR_USERNAME/.cargo/bin/gemini_proxy</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/homebrew/bin</string>
<key>GOOGLE_CLOUD_PROJECT</key>
<string>YOUR_GCP_PROJECT_ID</string>
<key>BIND_ADDRESS</key>
<string>127.0.0.1</string>
<key>PORT</key>
<string>8765</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/gemini-proxy.out.log</string>
<key>StandardErrorPath</key>
<string>/tmp/gemini-proxy.err.log</string>
</dict>
</plist>-
Load and Start the daemon (enabling auto-start on login):
launchctl load ~/Library/LaunchAgents/com.user.gemini-proxy.plist -
Stop the daemon from running:
launchctl unload ~/Library/LaunchAgents/com.user.gemini-proxy.plist -
Check logs to verify performance and inspect requests:
tail -f /tmp/gemini-proxy.out.log
To ensure these logs never fill up your filesystem, you can register them with macOS's native newsyslog utility for automatic, size-based log rotation and compression:
-
Create a newsyslog configuration file (requires
sudo):sudo touch /etc/newsyslog.d/gemini-proxy.conf
-
Open the file and paste this rule (e.g., via
sudo nano /etc/newsyslog.d/gemini-proxy.conf):# logfilename mode count size when flags /tmp/gemini-proxy.*.log 644 3 5000 * J
This rule tells macOS to automatically rotate the logs the moment they exceed 5 MB (5000 KB), compress them with high-efficiency bzip2 compression (turning a 5MB text log into ~200KB), and keep only the last 3 historical backups before automatically purging older ones.
This project was inspired by the original Node.js implementation: Intelligent-Internet/gemini-cli-mcp-openai-bridge. Our Rust rewrite focuses on minimizing resource consumption, adding security-isolated sandboxing, providing native parallelization, and enabling high-resolution terminal performance profiling.