Skip to content

Getty/langertha-knarr

Repository files navigation

Knarr — Universal Langertha LLM Hub

         .  *  .
        . _/|_ .          KNARR
     .  /|    |\ .        Universal LLM Hub
   ~~~~~|______|~~~~~
   ~~ ~~~~~~~~~~~~~ ~~    Cargo transport for any LLM protocol
   ~~~~~~~~~~~~~~~~~~~~

A universal hub that exposes any backend — a Langertha::Raider, a raw Langertha::Engine, a remote A2A or ACP agent, or your own custom logic — over the standard LLM HTTP wire protocols spoken by OpenWebUI, the OpenAI / Anthropic / Ollama clients, and the agent ecosystems around A2A, ACP, and AG-UI. One server, six protocols, any backend.

What's new in 1.100

  • Tool calls reach the engine. Configured (non-passthrough) routes now forward tools, tool_choice, response_format, temperature, and max_tokens to the Langertha engine. Previously these were silently dropped. Responses containing tool_calls are now serialised back to the client in each protocol's native format (OpenAI message.tool_calls, Anthropic tool_use content blocks, Ollama message.tool_calls).

  • Real token counts. When the engine returns a Langertha::Usage object (all native Langertha 0.500 engines do), the usage fields in the response carry actual numbers instead of zeros. Langfuse generations also get real counts.

  • Capability-aware parameter forwarding. Parameters are only sent to engines that support them ($engine->supports($cap)) so requests never fail because an optional parameter reached an engine that rejects it.

  • Tracing flush is non-blocking. The previous LWP::UserAgent call in end_trace blocked the event loop on every request. The flush now fires via Net::Async::HTTP and returns immediately.

  • Langertha::Knarr::Response value object. Single typed shape every handler returns and every protocol formatter consumes — replaces the plain { content, model } hashref that handlers used to emit. Handlers that return a Langertha::Response, a hashref, or a bare string all get coerced automatically.

  • Knarr::Request carries tool_choice and response_format as first-class attributes (extracted by the OpenAI / Anthropic / Ollama parsers) and exposes chat_f_args($engine) for building the named-arg list suitable for Langertha's chat_f entry point.

  • Langertha minimum bumped to 0.500 for Langertha::ToolCall value objects (methods instead of hash keys), Langertha::Usage, and the capability registry.

  • Breaking in Knarr::PSGI: constructor argument renamed from steerboard to knarr.

What's new in 1.000

Knarr 1.000 is a major architectural rewrite. Mojolicious is gone; the new core is built on IO::Async + Net::Async::HTTP::Server for native async streaming and seamless integration with Langertha's Future::AsyncAwait engines.

Layer Modules
Protocols Knarr::Protocol::OpenAI / Anthropic / Ollama / A2A / ACP / AGUI
Handlers Knarr::Handler::Router (model→engine via Knarr::Router) / Engine / Raider / Code / A2AClient / ACPClient
Core Langertha::Knarr — single async event loop, chunked streaming for SSE / NDJSON

The classic Knarr use case — point a client at Knarr, get tracing — still works via Knarr::Handler::Router, which uses your existing knarr.yaml config to resolve models to Langertha engines.

Breaking changes from pre-1.000:

  • Mojolicious and Test::Mojo are no longer dependencies.
  • knarr container is now a deprecated alias for knarr start --from-env.

An LLM proxy that routes requests from any client to any backend — with automatic Langfuse tracing for every call.

Set your API key, start the container, done. All requests are traced.

Getting Started

docker run -e ANTHROPIC_API_KEY -p 8080:8080 raudssus/langertha-knarr

Now point Claude Code at it:

ANTHROPIC_BASE_URL=http://localhost:8080 claude

That's it. Claude Code sends requests to Knarr, Knarr forwards them to Anthropic using your API key (passthrough mode). Add Langfuse keys and every request gets traced automatically.

How it works

Knarr starts in mixed mode by default: requests with a model name that's explicitly configured in knarr.yaml go through a Langertha engine (with full tracing, request logging, and value-object metrics); unknown model names tunnel straight through to the upstream API the client thinks it's talking to, using the client's own API key. No key duplication, no configuration required for the simple cases.

Claude Code                                    Anthropic API
    │                                               ▲
    │  ANTHROPIC_BASE_URL=http://localhost:8080     │
    ▼                                               │
  Knarr ──── Handler::Router ─┐                     │
    │           │             └── Handler::Passthrough ──►
    │           └── matches gpt-4o → Langertha::Engine::OpenAI
    │
    └── Tracing decorator → Langfuse
    └── RequestLog decorator → JSONL

For explicit routing (send "gpt-4o" requests to OpenAI, "cheap" to Groq), configure models in a YAML file or let knarr init scan your environment variables and generate one.

More examples

# OpenAI Python SDK
OPENAI_BASE_URL=http://localhost:8080/v1 python my_app.py

# curl
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

# Ollama clients (Open WebUI, etc.) — point at port 11434 in container mode
OLLAMA_HOST=http://localhost:11434 open-webui

# A2A discovery
curl http://localhost:8080/.well-known/agent.json

In container mode (the default for the Docker image), Knarr binds two listening sockets simultaneously, both serving every protocol:

  • Port 8080 — primary, OpenAI / Anthropic / A2A / ACP / AG-UI clients
  • Port 11434 — alias for Ollama clients that hardcode that port

Both ports run the same handler chain — the second port is a convenience alias so existing Ollama clients work without reconfiguration.

Windows

Use WSL2 — all commands work as-is inside a WSL terminal:

wsl
docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarr

Or with Docker Desktop from PowerShell:

docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarr

The --env-file .env approach works identically on Linux, macOS, and Windows. Create your .env file once, run the same command everywhere.

Using a .env File

Create a .env file with your API keys (see .env.example):

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

Then run with --env-file:

docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarr

Knarr reads the file, detects which providers have keys, configures them with sensible default models, and starts serving.

Docker Build

docker build -t raudssus/langertha-knarr .

Dependencies are installed via cpm from the cpanfile using MetaCPAN.

Docker Compose

The included docker-compose.yml starts Knarr with Langfuse tracing out of the box:

cp .env.example .env
# Edit .env — add your API keys and Langfuse keys
docker compose up

This starts:

Service Port Description
Knarr 8080, 11434 LLM Proxy
Langfuse 3000 Tracing Dashboard
PostgreSQL Langfuse storage

The docker-compose.yml automatically loads .env and connects Knarr to the Langfuse instance. Open http://localhost:3000 for the dashboard — every LLM call through Knarr is traced with model, input, output, latency, and token usage.

Minimal Docker Compose (without Langfuse)

If you don't need tracing:

services:
  knarr:
    image: raudssus/langertha-knarr
    ports:
      - "8080:8080"
      - "11434:11434"
    env_file: .env

Multiple Providers

Set multiple API keys — Knarr configures all of them automatically:

docker run --env-file .env -p 8080:8080 -p 11434:11434 raudssus/langertha-knarr
[knarr] Knarr LLM Proxy starting...
[knarr]
[knarr] Config: auto-detecting from environment variables
[knarr] Engines: 3 provider(s) configured
[knarr]
[knarr]   anthropic => Anthropic / claude-sonnet-4-6 (key from $ANTHROPIC_API_KEY)
[knarr]   groq => Groq / llama-3.3-70b-versatile (key from $GROQ_API_KEY)
[knarr]   openai => OpenAI / gpt-4o-mini (key from $OPENAI_API_KEY)
[knarr]
[knarr] Auto-discover: enabled (will query provider model lists)
[knarr] Default engine: OpenAI
[knarr] Langfuse: disabled (set LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY to enable)
[knarr] Proxy auth: open (set KNARR_API_KEY to require authentication)

Each provider gets a default model:

Provider Default Model ENV Variable
OpenAI gpt-4o-mini OPENAI_API_KEY
Anthropic claude-sonnet-4-6 ANTHROPIC_API_KEY
Groq llama-3.3-70b-versatile GROQ_API_KEY
Mistral mistral-large-latest MISTRAL_API_KEY
DeepSeek deepseek-chat DEEPSEEK_API_KEY
MiniMax MiniMax-M2.1 MINIMAX_API_KEY
Gemini gemini-2.0-flash GEMINI_API_KEY
OpenRouter openai/gpt-4o-mini OPENROUTER_API_KEY
Perplexity sonar PERPLEXITY_API_KEY
Cerebras llama-3.3-70b CEREBRAS_API_KEY

With auto-discover enabled (default), Knarr queries each provider's model list — so you can use any model they offer, not just the defaults.

Langfuse Tracing

Knarr traces every request automatically when Langfuse credentials are set. Add these to your .env:

LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

That's it. Every proxy request creates:

  • Trace with model name, engine type, API format, and full input/output
  • Generation with start/end time, token usage, and model information
  • Error tracking when backend calls fail
  • Tag knarr on all traces

Langfuse Cloud

Just set the keys — Langfuse Cloud (https://cloud.langfuse.com) is the default:

# .env
OPENAI_API_KEY=sk-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

Self-Hosted Langfuse

Use docker compose up for a local Langfuse stack, or point at your own:

# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_URL=http://my-langfuse-server:3000

Proxy Authentication

Protect your proxy with an API key:

# .env
KNARR_API_KEY=my-secret-proxy-key

Clients must send Authorization: Bearer my-secret-proxy-key or x-api-key: my-secret-proxy-key. The A2A discovery endpoint (/.well-known/agent.json) stays anonymous so agent clients can introspect.

API Formats

Knarr 1.000 speaks six wire protocols on every listening port. The protocol is selected by URL path, so a single Knarr listening on http://localhost:8080 answers all of them simultaneously:

OpenAI

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

curl http://localhost:8080/v1/models

Anthropic

curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello"}],"max_tokens":1024}'

Ollama

curl http://localhost:8080/api/chat \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

curl http://localhost:8080/api/tags

In container mode Knarr binds an extra :11434 socket as well, so existing Ollama clients work without reconfiguration.

A2A (Google Agent2Agent)

Knarr exposes the agent card at /.well-known/agent.json and accepts A2A JSON-RPC at POST / with methods tasks/send (sync) and tasks/sendSubscribe (streaming).

ACP (BeeAI / Linux Foundation)

POST /runs with mode: "sync" or mode: "stream"; agent listing at GET /agents.

AG-UI (CopilotKit)

POST /awp returning the AG-UI typed event stream.

All six formats support streaming — SSE for OpenAI / Anthropic / A2A / ACP / AG-UI, NDJSON for Ollama.

Tool Calling

For configured (non-passthrough) models, Knarr forwards tools and tool_choice to the Langertha engine via chat_f. Langertha normalises them to the engine's native wire format — so an OpenAI-format tools array reaches an Anthropic engine as tools + Anthropic tool_choice, and vice versa. Tool-call responses (Langertha::ToolCall objects) come back and are serialised to the client's protocol format:

Client protocol Tool call format in response
OpenAI message.tool_calls[], finish_reason: "tool_calls"
Anthropic content[] with type: "tool_use" blocks, stop_reason: "tool_use"
Ollama message.tool_calls[]

For passthrough models (unknown model names), the raw request bytes are forwarded 1:1 to the upstream API, so whatever tool-call format the client sent arrives at the provider unchanged.

Use Cases

Claude Code through any backend

docker run --env-file .env -p 8080:8080 raudssus/langertha-knarr

# In another terminal:
ANTHROPIC_BASE_URL=http://localhost:8080 claude

Every Claude Code request gets traced in Langfuse.

Ollama clients with cloud models

Use cloud LLMs from any Ollama-compatible client like Open WebUI:

docker run --env-file .env -p 11434:11434 raudssus/langertha-knarr

# Open WebUI connects to port 11434, thinks it's Ollama,
# but requests go to cloud providers through Knarr

Local + Cloud hybrid

Mount a config file for custom routing:

# knarr.yaml
models:
  llama3.2:
    engine: OllamaOpenAI
    url: http://host.docker.internal:11434/v1
    model: llama3.2
  gpt-4o:
    engine: OpenAI
default:
  engine: OllamaOpenAI
  url: http://host.docker.internal:11434/v1
docker run --env-file .env \
  -v ./knarr.yaml:/etc/knarr/config.yaml \
  -p 8080:8080 -p 11434:11434 \
  raudssus/langertha-knarr start -c /etc/knarr/config.yaml

Using a Config File

For more control than auto-detection, create a knarr.yaml:

listen:
  - "127.0.0.1:8080"
  - "127.0.0.1:11434"

models:
  gpt-4o:
    engine: OpenAI

  gpt-4o-mini:
    engine: OpenAI
    model: gpt-4o-mini

  claude-sonnet:
    engine: Anthropic
    model: claude-sonnet-4-6
    api_key: ${ANTHROPIC_API_KEY}

  local-llama:
    engine: OllamaOpenAI
    url: http://localhost:11434/v1
    model: llama3.2

  deepseek:
    engine: DeepSeek
    model: deepseek-chat

default:
  engine: OpenAI

auto_discover: true

# Passthrough: requests go directly to upstream APIs
# The client's own API key is used — no duplication needed
# Models with explicit config above are routed via Langertha,
# everything else passes through transparently
passthrough:
  anthropic: https://api.anthropic.com
  openai: https://api.openai.com
  # Or point at a custom upstream:
  # anthropic: https://my-anthropic-cache.internal

# proxy_api_key: your-secret

# langfuse:
#   url: http://localhost:3000
#   public_key: pk-lf-...
#   secret_key: sk-lf-...

Config values support ${ENV_VAR} interpolation — variables are resolved at startup.

models.<name>.engine resolves in this order:

  • Langertha::Engine::<EngineName>
  • LangerthaX::Engine::<EngineName>
  • Fully-qualified class name if you set one directly

Passthrough Mode

Passthrough is the default behavior: requests for unconfigured models go directly to the upstream API using the client's own API key and headers. All HTTP bytes — including SSE chunks, tool_use blocks, usage data, and cache_control — are piped 1:1 to the client. No key duplication, no model configuration needed. Knarr just sits in the middle and traces.

If you also configure explicit model routing (the models: section), those specific models are handled by Langertha engines. Everything else still passes through as raw bytes.

Enabled by default with --from-env. In a config file:

# Enable with default upstream URLs
passthrough: true

# Or per format with custom upstreams
passthrough:
  anthropic: https://api.anthropic.com
  openai: https://my-openai-mirror.internal

Claude Code example — no Knarr API key needed, your existing key works:

docker run -p 8080:8080 raudssus/langertha-knarr
ANTHROPIC_BASE_URL=http://localhost:8080 claude

Generating a Config

Knarr can generate a config from your environment:

# Via Docker — pass your env vars through
docker run --rm --env-file .env raudssus/langertha-knarr init > knarr.yaml

# Or pass all API keys from your current shell
docker run --rm \
  $(env | grep -E '_(API_KEY|API_TOKEN)=|^LANGFUSE_' | sed 's/^/-e /') \
  raudssus/langertha-knarr init > knarr.yaml

Then mount it:

docker run --env-file .env \
  -v ./knarr.yaml:/etc/knarr/config.yaml \
  -p 8080:8080 -p 11434:11434 \
  raudssus/langertha-knarr start -c /etc/knarr/config.yaml

All Environment Variables

API Keys

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GROQ_API_KEY Groq
MISTRAL_API_KEY Mistral
DEEPSEEK_API_KEY DeepSeek
MINIMAX_API_KEY MiniMax
GEMINI_API_KEY Gemini
OPENROUTER_API_KEY OpenRouter
PERPLEXITY_API_KEY Perplexity
CEREBRAS_API_KEY Cerebras
REPLICATE_API_TOKEN Replicate
HUGGINGFACE_API_KEY HuggingFace

LANGERTHA_-prefixed variants (e.g., LANGERTHA_OPENAI_API_KEY) take priority over bare names.

Langfuse

Variable Description Default
LANGFUSE_PUBLIC_KEY Public key (pk-lf-...)
LANGFUSE_SECRET_KEY Secret key (sk-lf-...)
LANGFUSE_URL Server URL https://cloud.langfuse.com

Proxy

Variable Description Default
KNARR_API_KEY Require client authentication — (open)
KNARR_DEBUG Enable verbose logging (1 = on) — (off)

CLI Reference

knarr                                      Show help
knarr start                                Start with config file (./knarr.yaml)
knarr start --from-env                     Auto-detect config from ENV (Docker default)
knarr start --from-env -p 8080 -p 11434   ENV config, explicit ports
knarr start -p 9090                        Custom port
knarr start -c prod.yaml                   Custom config
knarr start -v                             Verbose logging
knarr init                                 Generate config from environment
knarr init -e .env                         Include .env file in scan
knarr models                               List configured models
knarr models --format json
knarr check                                Validate config file

The -p / --port flag is repeatable — each occurrence adds a listen port. Default host is 0.0.0.0. Set KNARR_DEBUG=1 or use -v for verbose logging.

Installing as a Perl Module

Knarr is also a standard CPAN distribution:

cpanm Langertha::Knarr

Then use the knarr CLI directly:

export OPENAI_API_KEY=sk-...
knarr init > knarr.yaml
knarr start

Using Knarr Programmatically

Knarr 1.000 is built around a handler and one or more wire protocols. You construct a handler (typically Handler::Router driven by your existing knarr.yaml), optionally wrap it in tracing/logging decorators, and pass it to a Langertha::Knarr instance:

use IO::Async::Loop;
use Langertha::Knarr;
use Langertha::Knarr::Config;
use Langertha::Knarr::Router;
use Langertha::Knarr::Handler::Router;

my $loop   = IO::Async::Loop->new;
my $config = Langertha::Knarr::Config->new(file => 'knarr.yaml');
my $router = Langertha::Knarr::Router->new(config => $config);

my $handler = Langertha::Knarr::Handler::Router->new(router => $router);

my $knarr = Langertha::Knarr->new(
  handler => $handler,
  loop    => $loop,
  listen  => $config->listen,   # arrayref of "host:port" strings
);
$knarr->run;   # blocks

Wrapping with tracing and logging

Both Tracing and RequestLog are decorator handlers — they wrap any inner handler and forward chat/stream calls through, recording before and after:

use Langertha::Knarr::Tracing;
use Langertha::Knarr::Handler::Tracing;
use Langertha::Knarr::Handler::RequestLog;

my $tracing = Langertha::Knarr::Tracing->new(config => $config);
$handler = Langertha::Knarr::Handler::Tracing->new(
  wrapped => $handler,
  tracing => $tracing,
) if $tracing->_enabled;

my $rlog = Langertha::Knarr::RequestLog->new(config => $config);
$handler = Langertha::Knarr::Handler::RequestLog->new(
  wrapped     => $handler,
  request_log => $rlog,
) if $rlog->_enabled;

knarr start applies both wrappers automatically when their respective config sections are present.

Adding passthrough fallback

To preserve the "configured models go through Langertha, everything else tunnels straight to the upstream API" behaviour, give the router a Handler::Passthrough fallback:

use Langertha::Knarr::Handler::Passthrough;

my $passthrough = Langertha::Knarr::Handler::Passthrough->new(
  upstreams => $config->passthrough,   # { openai => 'https://api.openai.com', ... }
  loop      => $loop,
);
my $handler = Langertha::Knarr::Handler::Router->new(
  router      => $router,
  passthrough => $passthrough,
);

Using the Config and Router Independently

use Langertha::Knarr::Config;
use Langertha::Knarr::Router;

my $config = Langertha::Knarr::Config->new(file => 'knarr.yaml');
my $router = Langertha::Knarr::Router->new(config => $config);

# Resolve a model name to a Langertha engine
my ($engine, $model) = $router->resolve('gpt-4o-mini');
# $engine is a Langertha::Engine::OpenAI (or whatever the config maps to)
# $model is the resolved model name

my $response = $engine->simple_chat(
  { role => 'user', content => 'Hello!' },
);

Built With

License

This software is copyright (c) 2026 by Torsten Raudssus.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

About

LLM Proxy with Langfuse Tracing - OpenAI, Anthropic, Ollama format support

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors