. * .
. _/|_ . KNARR
. /| |\ . Universal LLM Hub
~~~~~|______|~~~~~
~~ ~~~~~~~~~~~~~ ~~ Cargo transport for any LLM protocol
~~~~~~~~~~~~~~~~~~~~
A universal hub that exposes any backend — a Langertha::Raider, a raw
Langertha::Engine, a remote A2A or ACP agent, or your own custom logic —
over the standard LLM HTTP wire protocols spoken by OpenWebUI, the OpenAI /
Anthropic / Ollama clients, and the agent ecosystems around A2A, ACP, and
AG-UI. One server, six protocols, any backend.
-
Tool calls reach the engine. Configured (non-passthrough) routes now forward
tools,tool_choice,response_format,temperature, andmax_tokensto the Langertha engine. Previously these were silently dropped. Responses containingtool_callsare now serialised back to the client in each protocol's native format (OpenAImessage.tool_calls, Anthropictool_usecontent blocks, Ollamamessage.tool_calls). -
Real token counts. When the engine returns a
Langertha::Usageobject (all native Langertha 0.500 engines do), the usage fields in the response carry actual numbers instead of zeros. Langfuse generations also get real counts. -
Capability-aware parameter forwarding. Parameters are only sent to engines that support them (
$engine->supports($cap)) so requests never fail because an optional parameter reached an engine that rejects it. -
Tracing flush is non-blocking. The previous
LWP::UserAgentcall inend_traceblocked the event loop on every request. The flush now fires viaNet::Async::HTTPand returns immediately. -
Langertha::Knarr::Responsevalue object. Single typed shape every handler returns and every protocol formatter consumes — replaces the plain{ content, model }hashref that handlers used to emit. Handlers that return aLangertha::Response, a hashref, or a bare string all get coerced automatically. -
Knarr::Requestcarriestool_choiceandresponse_formatas first-class attributes (extracted by the OpenAI / Anthropic / Ollama parsers) and exposeschat_f_args($engine)for building the named-arg list suitable for Langertha'schat_fentry point. -
Langertha minimum bumped to 0.500 for
Langertha::ToolCallvalue objects (methods instead of hash keys),Langertha::Usage, and the capability registry. -
Breaking in
Knarr::PSGI: constructor argument renamed fromsteerboardtoknarr.
Knarr 1.000 is a major architectural rewrite. Mojolicious is gone; the new
core is built on IO::Async + Net::Async::HTTP::Server for native async
streaming and seamless integration with Langertha's Future::AsyncAwait
engines.
| Layer | Modules |
|---|---|
| Protocols | Knarr::Protocol::OpenAI / Anthropic / Ollama / A2A / ACP / AGUI |
| Handlers | Knarr::Handler::Router (model→engine via Knarr::Router) / Engine / Raider / Code / A2AClient / ACPClient |
| Core | Langertha::Knarr — single async event loop, chunked streaming for SSE / NDJSON |
The classic Knarr use case — point a client at Knarr, get tracing — still
works via Knarr::Handler::Router, which uses your existing knarr.yaml
config to resolve models to Langertha engines.
Breaking changes from pre-1.000:
MojoliciousandTest::Mojoare no longer dependencies.knarr containeris now a deprecated alias forknarr start --from-env.
An LLM proxy that routes requests from any client to any backend — with automatic Langfuse tracing for every call.
Set your API key, start the container, done. All requests are traced.
docker run -e ANTHROPIC_API_KEY -p 8080:8080 raudssus/langertha-knarrNow point Claude Code at it:
ANTHROPIC_BASE_URL=http://localhost:8080 claudeThat's it. Claude Code sends requests to Knarr, Knarr forwards them to Anthropic using your API key (passthrough mode). Add Langfuse keys and every request gets traced automatically.
Knarr starts in mixed mode by default: requests with a model name
that's explicitly configured in knarr.yaml go through a Langertha
engine (with full tracing, request logging, and value-object metrics);
unknown model names tunnel straight through to the upstream API the
client thinks it's talking to, using the client's own API key. No key
duplication, no configuration required for the simple cases.
Claude Code Anthropic API
│ ▲
│ ANTHROPIC_BASE_URL=http://localhost:8080 │
▼ │
Knarr ──── Handler::Router ─┐ │
│ │ └── Handler::Passthrough ──►
│ └── matches gpt-4o → Langertha::Engine::OpenAI
│
└── Tracing decorator → Langfuse
└── RequestLog decorator → JSONL
For explicit routing (send "gpt-4o" requests to OpenAI, "cheap" to
Groq), configure models in a YAML file or let knarr init scan your
environment variables and generate one.
# OpenAI Python SDK
OPENAI_BASE_URL=http://localhost:8080/v1 python my_app.py
# curl
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
# Ollama clients (Open WebUI, etc.) — point at port 11434 in container mode
OLLAMA_HOST=http://localhost:11434 open-webui
# A2A discovery
curl http://localhost:8080/.well-known/agent.jsonIn container mode (the default for the Docker image), Knarr binds two listening sockets simultaneously, both serving every protocol:
- Port 8080 — primary, OpenAI / Anthropic / A2A / ACP / AG-UI clients
- Port 11434 — alias for Ollama clients that hardcode that port
Both ports run the same handler chain — the second port is a convenience alias so existing Ollama clients work without reconfiguration.
Use WSL2 — all commands work as-is inside a WSL terminal:
wsl
docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarrOr with Docker Desktop from PowerShell:
docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarrThe --env-file .env approach works identically on Linux, macOS, and
Windows. Create your .env file once, run the same command everywhere.
Create a .env file with your API keys (see .env.example):
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...Then run with --env-file:
docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarrKnarr reads the file, detects which providers have keys, configures them with sensible default models, and starts serving.
docker build -t raudssus/langertha-knarr .Dependencies are installed via cpm from the cpanfile using MetaCPAN.
The included docker-compose.yml starts Knarr with Langfuse tracing
out of the box:
cp .env.example .env
# Edit .env — add your API keys and Langfuse keys
docker compose upThis starts:
| Service | Port | Description |
|---|---|---|
| Knarr | 8080, 11434 | LLM Proxy |
| Langfuse | 3000 | Tracing Dashboard |
| PostgreSQL | — | Langfuse storage |
The docker-compose.yml automatically loads .env and connects Knarr to
the Langfuse instance. Open http://localhost:3000 for the dashboard — every
LLM call through Knarr is traced with model, input, output, latency, and
token usage.
If you don't need tracing:
services:
knarr:
image: raudssus/langertha-knarr
ports:
- "8080:8080"
- "11434:11434"
env_file: .envSet multiple API keys — Knarr configures all of them automatically:
docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarr[knarr] Knarr LLM Proxy starting...
[knarr]
[knarr] Config: auto-detecting from environment variables
[knarr] Engines: 3 provider(s) configured
[knarr]
[knarr] anthropic => Anthropic / claude-sonnet-4-6 (key from $ANTHROPIC_API_KEY)
[knarr] groq => Groq / llama-3.3-70b-versatile (key from $GROQ_API_KEY)
[knarr] openai => OpenAI / gpt-4o-mini (key from $OPENAI_API_KEY)
[knarr]
[knarr] Auto-discover: enabled (will query provider model lists)
[knarr] Default engine: OpenAI
[knarr] Langfuse: disabled (set LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY to enable)
[knarr] Proxy auth: open (set KNARR_API_KEY to require authentication)
Each provider gets a default model:
| Provider | Default Model | ENV Variable |
|---|---|---|
| OpenAI | gpt-4o-mini | OPENAI_API_KEY |
| Anthropic | claude-sonnet-4-6 | ANTHROPIC_API_KEY |
| Groq | llama-3.3-70b-versatile | GROQ_API_KEY |
| Mistral | mistral-large-latest | MISTRAL_API_KEY |
| DeepSeek | deepseek-chat | DEEPSEEK_API_KEY |
| MiniMax | MiniMax-M2.1 | MINIMAX_API_KEY |
| Gemini | gemini-2.0-flash | GEMINI_API_KEY |
| OpenRouter | openai/gpt-4o-mini | OPENROUTER_API_KEY |
| Perplexity | sonar | PERPLEXITY_API_KEY |
| Cerebras | llama-3.3-70b | CEREBRAS_API_KEY |
With auto-discover enabled (default), Knarr queries each provider's model list — so you can use any model they offer, not just the defaults.
Knarr traces every request automatically when Langfuse credentials are set.
Add these to your .env:
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...That's it. Every proxy request creates:
- Trace with model name, engine type, API format, and full input/output
- Generation with start/end time, token usage, and model information
- Error tracking when backend calls fail
- Tag
knarron all traces
Just set the keys — Langfuse Cloud (https://cloud.langfuse.com) is the
default:
# .env
OPENAI_API_KEY=sk-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...Use docker compose up for a local Langfuse stack, or point at your own:
# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_URL=http://my-langfuse-server:3000Protect your proxy with an API key:
# .env
KNARR_API_KEY=my-secret-proxy-keyClients must send Authorization: Bearer my-secret-proxy-key or
x-api-key: my-secret-proxy-key. The A2A discovery endpoint
(/.well-known/agent.json) stays anonymous so agent clients can
introspect.
Knarr 1.000 speaks six wire protocols on every listening port. The
protocol is selected by URL path, so a single Knarr listening on
http://localhost:8080 answers all of them simultaneously:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
curl http://localhost:8080/v1/modelscurl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello"}],"max_tokens":1024}'curl http://localhost:8080/api/chat \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
curl http://localhost:8080/api/tagsIn container mode Knarr binds an extra :11434 socket as well, so
existing Ollama clients work without reconfiguration.
Knarr exposes the agent card at /.well-known/agent.json and accepts
A2A JSON-RPC at POST / with methods tasks/send (sync) and
tasks/sendSubscribe (streaming).
POST /runs with mode: "sync" or mode: "stream"; agent listing at
GET /agents.
POST /awp returning the AG-UI typed event stream.
All six formats support streaming — SSE for OpenAI / Anthropic / A2A / ACP / AG-UI, NDJSON for Ollama.
For configured (non-passthrough) models, Knarr forwards tools and
tool_choice to the Langertha engine via chat_f. Langertha normalises them
to the engine's native wire format — so an OpenAI-format tools array reaches
an Anthropic engine as tools + Anthropic tool_choice, and vice versa.
Tool-call responses (Langertha::ToolCall objects) come back and are
serialised to the client's protocol format:
| Client protocol | Tool call format in response |
|---|---|
| OpenAI | message.tool_calls[], finish_reason: "tool_calls" |
| Anthropic | content[] with type: "tool_use" blocks, stop_reason: "tool_use" |
| Ollama | message.tool_calls[] |
For passthrough models (unknown model names), the raw request bytes are forwarded 1:1 to the upstream API, so whatever tool-call format the client sent arrives at the provider unchanged.
docker run --env-file .env -p 8080:8080 raudssus/langertha-knarr
# In another terminal:
ANTHROPIC_BASE_URL=http://localhost:8080 claudeEvery Claude Code request gets traced in Langfuse.
Use cloud LLMs from any Ollama-compatible client like Open WebUI:
docker run --env-file .env -p 11434:11434 raudssus/langertha-knarr
# Open WebUI connects to port 11434, thinks it's Ollama,
# but requests go to cloud providers through KnarrMount a config file for custom routing:
# knarr.yaml
models:
llama3.2:
engine: OllamaOpenAI
url: http://host.docker.internal:11434/v1
model: llama3.2
gpt-4o:
engine: OpenAI
default:
engine: OllamaOpenAI
url: http://host.docker.internal:11434/v1docker run --env-file .env \
-v ./knarr.yaml:/etc/knarr/config.yaml \
-p 8080:8080 -p 11434:11434 \
raudssus/langertha-knarr start -c /etc/knarr/config.yamlFor more control than auto-detection, create a knarr.yaml:
listen:
- "127.0.0.1:8080"
- "127.0.0.1:11434"
models:
gpt-4o:
engine: OpenAI
gpt-4o-mini:
engine: OpenAI
model: gpt-4o-mini
claude-sonnet:
engine: Anthropic
model: claude-sonnet-4-6
api_key: ${ANTHROPIC_API_KEY}
local-llama:
engine: OllamaOpenAI
url: http://localhost:11434/v1
model: llama3.2
deepseek:
engine: DeepSeek
model: deepseek-chat
default:
engine: OpenAI
auto_discover: true
# Passthrough: requests go directly to upstream APIs
# The client's own API key is used — no duplication needed
# Models with explicit config above are routed via Langertha,
# everything else passes through transparently
passthrough:
anthropic: https://api.anthropic.com
openai: https://api.openai.com
# Or point at a custom upstream:
# anthropic: https://my-anthropic-cache.internal
# proxy_api_key: your-secret
# langfuse:
# url: http://localhost:3000
# public_key: pk-lf-...
# secret_key: sk-lf-...Config values support ${ENV_VAR} interpolation — variables are resolved
at startup.
models.<name>.engine resolves in this order:
Langertha::Engine::<EngineName>LangerthaX::Engine::<EngineName>- Fully-qualified class name if you set one directly
Passthrough is the default behavior: requests for unconfigured models go directly to the upstream API using the client's own API key and headers. All HTTP bytes — including SSE chunks, tool_use blocks, usage data, and cache_control — are piped 1:1 to the client. No key duplication, no model configuration needed. Knarr just sits in the middle and traces.
If you also configure explicit model routing (the models: section), those
specific models are handled by Langertha engines. Everything else still
passes through as raw bytes.
Enabled by default with --from-env. In a config file:
# Enable with default upstream URLs
passthrough: true
# Or per format with custom upstreams
passthrough:
anthropic: https://api.anthropic.com
openai: https://my-openai-mirror.internalClaude Code example — no Knarr API key needed, your existing key works:
docker run -p 8080:8080 raudssus/langertha-knarr
ANTHROPIC_BASE_URL=http://localhost:8080 claudeKnarr can generate a config from your environment:
# Via Docker — pass your env vars through
docker run --rm --env-file .env raudssus/langertha-knarr init > knarr.yaml
# Or pass all API keys from your current shell
docker run --rm \
$(env | grep -E '_(API_KEY|API_TOKEN)=|^LANGFUSE_' | sed 's/^/-e /') \
raudssus/langertha-knarr init > knarr.yamlThen mount it:
docker run --env-file .env \
-v ./knarr.yaml:/etc/knarr/config.yaml \
-p 8080:8080 -p 11434:11434 \
raudssus/langertha-knarr start -c /etc/knarr/config.yaml| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GROQ_API_KEY |
Groq |
MISTRAL_API_KEY |
Mistral |
DEEPSEEK_API_KEY |
DeepSeek |
MINIMAX_API_KEY |
MiniMax |
GEMINI_API_KEY |
Gemini |
OPENROUTER_API_KEY |
OpenRouter |
PERPLEXITY_API_KEY |
Perplexity |
CEREBRAS_API_KEY |
Cerebras |
REPLICATE_API_TOKEN |
Replicate |
HUGGINGFACE_API_KEY |
HuggingFace |
LANGERTHA_-prefixed variants (e.g., LANGERTHA_OPENAI_API_KEY) take
priority over bare names.
| Variable | Description | Default |
|---|---|---|
LANGFUSE_PUBLIC_KEY |
Public key (pk-lf-...) |
— |
LANGFUSE_SECRET_KEY |
Secret key (sk-lf-...) |
— |
LANGFUSE_URL |
Server URL | https://cloud.langfuse.com |
| Variable | Description | Default |
|---|---|---|
KNARR_API_KEY |
Require client authentication | — (open) |
KNARR_DEBUG |
Enable verbose logging (1 = on) |
— (off) |
knarr Show help
knarr start Start with config file (./knarr.yaml)
knarr start --from-env Auto-detect config from ENV (Docker default)
knarr start --from-env -p 8080 -p 11434 ENV config, explicit ports
knarr start -p 9090 Custom port
knarr start -c prod.yaml Custom config
knarr start -v Verbose logging
knarr init Generate config from environment
knarr init -e .env Include .env file in scan
knarr models List configured models
knarr models --format json
knarr check Validate config file
The -p / --port flag is repeatable — each occurrence adds a listen port.
Default host is 0.0.0.0. Set KNARR_DEBUG=1 or use -v for verbose logging.
Knarr is also a standard CPAN distribution:
cpanm Langertha::KnarrThen use the knarr CLI directly:
export OPENAI_API_KEY=sk-...
knarr init > knarr.yaml
knarr startKnarr 1.000 is built around a handler and one or more wire protocols.
You construct a handler (typically Handler::Router driven by your
existing knarr.yaml), optionally wrap it in tracing/logging decorators,
and pass it to a Langertha::Knarr instance:
use IO::Async::Loop;
use Langertha::Knarr;
use Langertha::Knarr::Config;
use Langertha::Knarr::Router;
use Langertha::Knarr::Handler::Router;
my $loop = IO::Async::Loop->new;
my $config = Langertha::Knarr::Config->new(file => 'knarr.yaml');
my $router = Langertha::Knarr::Router->new(config => $config);
my $handler = Langertha::Knarr::Handler::Router->new(router => $router);
my $knarr = Langertha::Knarr->new(
handler => $handler,
loop => $loop,
listen => $config->listen, # arrayref of "host:port" strings
);
$knarr->run; # blocksBoth Tracing and RequestLog are decorator handlers — they wrap any
inner handler and forward chat/stream calls through, recording before
and after:
use Langertha::Knarr::Tracing;
use Langertha::Knarr::Handler::Tracing;
use Langertha::Knarr::Handler::RequestLog;
my $tracing = Langertha::Knarr::Tracing->new(config => $config);
$handler = Langertha::Knarr::Handler::Tracing->new(
wrapped => $handler,
tracing => $tracing,
) if $tracing->_enabled;
my $rlog = Langertha::Knarr::RequestLog->new(config => $config);
$handler = Langertha::Knarr::Handler::RequestLog->new(
wrapped => $handler,
request_log => $rlog,
) if $rlog->_enabled;knarr start applies both wrappers automatically when their respective
config sections are present.
To preserve the "configured models go through Langertha, everything else
tunnels straight to the upstream API" behaviour, give the router a
Handler::Passthrough fallback:
use Langertha::Knarr::Handler::Passthrough;
my $passthrough = Langertha::Knarr::Handler::Passthrough->new(
upstreams => $config->passthrough, # { openai => 'https://api.openai.com', ... }
loop => $loop,
);
my $handler = Langertha::Knarr::Handler::Router->new(
router => $router,
passthrough => $passthrough,
);use Langertha::Knarr::Config;
use Langertha::Knarr::Router;
my $config = Langertha::Knarr::Config->new(file => 'knarr.yaml');
my $router = Langertha::Knarr::Router->new(config => $config);
# Resolve a model name to a Langertha engine
my ($engine, $model) = $router->resolve('gpt-4o-mini');
# $engine is a Langertha::Engine::OpenAI (or whatever the config maps to)
# $model is the resolved model name
my $response = $engine->simple_chat(
{ role => 'user', content => 'Hello!' },
);- Langertha — Perl LLM framework with 22+ engine backends
- IO::Async + Net::Async::HTTP::Server — Async event loop and HTTP server
- Future::AsyncAwait — Native async/await for Perl
- Moose — Postmodern object system
- Langfuse — Open source LLM observability
This software is copyright (c) 2026 by Torsten Raudssus.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.