ditto-llm

Ditto-LLM is a small Rust SDK that provides a unified interface for calling multiple LLM providers.

Goal: become a superset of LiteLLM Proxy + Vercel AI SDK Core via layering + Cargo feature gating. See COMPARED_TO_LITELLM_AI_SDK.md and TODO.md for the parity notes and roadmap.

Layered product plan (L0/L1/L2):

L0 (this repo): model adapters + protocol/shape conversion + direct SDK usage.
L1 (this repo): gateway/proxy platform (API surface, routing, budgets, observability, admin).
L2 (separate repo): enterprise closed-loop platform (prompt/eval/agent eval/org governance).
Boundary: L2 depends on L1 contracts; L1 remains independently deployable for SMB/mid-market.
Frozen L1 contract artifacts: contracts/gateway-contract-v0.1.openapi.yaml + crates/ditto-gateway-contract-types.

Current scope:

Default build: generic OpenAI-compatible LLM core (provider-openai-compatible + cap-llm). This is the stable base and only default capability promise.
Unified LLM types + traits: LanguageModel, Message/ContentPart, Tool, StreamChunk, Warning.
Text helpers: generate_text / stream_text (AI SDK-style generateText / streamText).
Structured outputs: generate_object_json / stream_object (AI SDK-style generateObject / streamObject).
Multi-modal inputs at the request shape level: images + PDF documents via ContentPart::Image / ContentPart::File (provider support varies; unsupported parts emit Warning).
Parameter hygiene: temperature/top_p are clamped to provider ranges; non-finite values are dropped (with warnings).
Default provider path: OpenAI-compatible Chat Completions (LiteLLM / DeepSeek / Qwen / OpenRouter / local gateways / etc.) with generate + SSE streaming + tools.
Optional provider packs and capability packs add official OpenAI Responses, embeddings, images, audio, moderations, Google GenAI, Anthropic Messages, Cohere, Bedrock, Vertex, batches, rerank, and gateway translation surfaces.
Provider profile config and model discovery (ProviderConfig / GET /models) remain available for routing use-cases, but the default examples and docs now assume a generic OpenAI-compatible upstream.

Optional feature-gated modules:

Agent tool loop: ToolLoopAgent + ToolExecutor (feature agent).
Auth adapters: SigV4 signer + OAuth client-credentials flow (feature auth).
Providers: Bedrock (SigV4) and Vertex (OAuth) adapters with generate + SSE streaming + tools (features provider-bedrock, provider-vertex).
SDK utilities: stream protocol v1, HTTP adapters (SSE/NDJSON), telemetry sink, devtools JSONL logger, MCP tool adapter, cache middleware with streaming replay (feature sdk).
SDK HTTP helpers: optional axum response builders for stream adapters (feature sdk-axum).
Gateway control-plane: virtual keys, limits, cache, budget, routing, guardrails, passthrough, plus a ditto-gateway HTTP server (feature gateway). Includes LiteLLM-like conveniences such as /key/* endpoints, /a2a/* agent proxy, and /mcp* MCP tool gateway.
Gateway token counting: tiktoken-based input token estimation for proxy budgets/guardrails/costing (feature gateway-tokenizer).
Gateway translation proxy: OpenAI-compatible GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos* (create/list/retrieve/delete/content/remix), POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches backed by Ditto providers (feature gateway-translation).
Gateway proxy caching: in-memory cache for non-streaming OpenAI-compatible responses (feature gateway-proxy-cache).
Gateway OpenTelemetry: OTLP tracing exporter + structured logs for gateway HTTP requests (feature gateway-otel).

Non-goals (for now):

The default build is not an API gateway/proxy; the gateway feature adds a lightweight control-plane + HTTP service. The gateway-translation feature adds translation for GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos*, POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches. Full OpenAI surface translation (etc) is tracked in TODO.md.
Core helpers are single-step and return tool calls to the caller; the agent feature offers an opt-in tool loop, but it is not enabled by default.
It is not a full UI SDK (no frontend hooks or middleware ecosystem); the sdk feature only provides protocol/telemetry/devtools/MCP utilities.
Bedrock support targets Anthropic Messages-on-Bedrock; other Bedrock model families and Vertex service-account JWT flows are not covered yet.

See PROVIDERS.md for a pragmatic provider/capability matrix (native adapters + OpenAI-compatible gateway coverage).

Docs

This repo includes an mdBook under docs/. For the stable docs entrypoints, start with docs/README.md and docs/docs-system-map.md. Use ./scripts/check-docs-system.sh to verify the repository-level docs skeleton.

cargo install mdbook
mdbook serve docs

If you don’t want to install mdBook, you can still read the Markdown directly in docs/src.

Provider Packs and Capability Packs

Ditto now documents provider integration around three separate axes:

Default core: provider-openai-compatible + cap-llm is the only out-of-the-box contract.
Provider packs: provider-openai, provider-anthropic, provider-google, provider-cohere, provider-bedrock, provider-vertex, plus provider-specific packs such as provider-deepseek, provider-kimi, and provider-openrouter.
Capability packs: cap-llm, cap-embedding, cap-image-generation, cap-image-edit, cap-audio-transcription, cap-audio-speech, cap-moderation, cap-rerank, cap-batch, cap-realtime.

The intended boundary is:

provider selects the runtime adapter/provider pack.
ProviderConfig configures one concrete upstream node for that runtime.
GenerateRequest.provider_options stays request-scoped.

See PROVIDERS.md for the provider × capability × feature × status table.

Tool Schemas

For Google function calling, Ditto-LLM converts tool parameter JSON Schema into an OpenAPI-style schema.

Contract:

Conversion is best-effort and lossy: unsupported keywords are ignored (dropped), not errors.
Unsupported keywords may emit Warning::Compatibility(tool.parameters.unsupported_keywords) to avoid silent data loss.
$ref is best-effort: local refs (#/...) are resolved; unresolvable refs are ignored and a Warning::Compatibility(tool.parameters.$ref) is emitted.
Root empty-object schemas (no properties + additionalProperties missing/false) are treated as "no parameters" and omitted.
Boolean schemas (true/false) are treated as unconstrained schemas; at the root they are omitted.
Nullable unions:
- type: ["string", "null"] becomes anyOf: [{ "type": "string" }] + nullable: true
- anyOf: [{...}, {"type":"null"}] becomes the same shape (single branch is flattened)
const becomes enum: [<const>].
additionalProperties supports boolean and nested schemas.

Supported keywords (subset): type, title, description, properties, required, items, additionalProperties, enum, const, format, allOf, anyOf, oneOf, default, minLength/maxLength/pattern, minItems/maxItems/uniqueItems, minProperties/maxProperties, minimum/maximum/multipleOf, and exclusiveMinimum/exclusiveMaximum (number form → minimum/maximum + exclusive* = true).

Examples

Default-core examples expect a generic OpenAI-compatible upstream:

export OPENAI_COMPAT_BASE_URL="https://your-openai-compatible-endpoint/v1"
export OPENAI_COMPAT_MODEL="your-chat-model"
export OPENAI_COMPAT_API_KEY="sk-..."   # optional for local gateways that do not require auth

cargo run --example basic
cargo run --example streaming
cargo run --example tool_calling
cargo run --example openai_compatible

Additional provider/capability examples stay opt-in:

cargo run --example openai_compatible_embeddings --features cap-embedding
cargo run --example embeddings --features "provider-openai cap-embedding"
cargo run --example multimodal --features "provider-openai cap-llm base64" -- ./image.png ./doc.pdf
cargo run --example batches --features "provider-openai-compatible cap-batch" -- ./requests.jsonl

Gateway (optional)

Run the HTTP gateway (feature gateway):

cargo run -p ditto-server --features gateway --bin ditto-gateway -- ./gateway.json --listen 0.0.0.0:8080

YAML config is optional (feature gateway-config-yaml):

cargo run --features gateway-config-yaml --bin ditto-gateway -- ./gateway.yaml --listen 0.0.0.0:8080

Optional admin UI asset (React; outside the default core build/CI path):

pnpm install
pnpm run dev:admin-ui

Minimal multi-language gateway clients:

Node (SSE streaming): examples/clients/node/stream_chat_completions.mjs
Python: examples/clients/python/chat_completions.py
Go: examples/clients/go/chat_completions.go

Backends are configured in gateway.json (OpenAI-compatible upstreams + injected headers/query params, e.g. Authorization and Azure-style api-version):

{
  "backends": [
    {
      "name": "primary",
      "base_url": "https://api.openai.com/v1",
      "max_in_flight": 64,
      "timeout_seconds": 60,
      "headers": { "authorization": "Bearer ${OPENAI_API_KEY}" },
      "query_params": {}
    }
  ],
  "virtual_keys": [
    {
      "id": "local-dev",
      "token": "${DITTO_VIRTUAL_KEY}",
      "enabled": true,
      "limits": {},
      "budget": {},
      "cache": {},
      "guardrails": {},
      "passthrough": {},
      "route": null
    }
  ],
  "router": { "default_backends": [{ "backend": "primary", "weight": 1.0 }], "rules": [] }
}

backends[].max_in_flight optionally caps concurrent in-flight proxy requests per backend (rejects with HTTP 429 + OpenAI-style error code inflight_limit_backend). backends[].timeout_seconds optionally overrides the backend request timeout in seconds (default: 300s).

Gateway config supports ${ENV_VAR} interpolation in backend base_url/headers/query_params, backend provider_config node fields (for example base_url, default_model, http_headers, http_query_params, auth, upstream_api, normalize_to, normalize_endpoint), virtual_keys[].token, a2a_agents[] (agent url/headers/query), and mcp_servers[] (server url/headers/query) (expanded at startup via the process env or --dotenv).

Translation backends (feature gateway-translation) can be configured with provider + provider_config (same shape as ProviderConfig):

{
  "backends": [
    {
      "name": "anthropic",
      "provider": "anthropic",
      "provider_config": {
        "auth": { "type": "api_key_env", "keys": ["ANTHROPIC_API_KEY"] },
        "default_model": "claude-3-5-sonnet-20241022"
      }
    }
  ],
  "virtual_keys": [
    {
      "id": "local-dev",
      "token": "${DITTO_VIRTUAL_KEY}",
      "enabled": true,
      "limits": {},
      "budget": {},
      "cache": {},
      "guardrails": {},
      "passthrough": {},
      "route": null
    }
  ],
  "router": { "default_backends": [{ "backend": "anthropic", "weight": 1.0 }], "rules": [] }
}

provider selects the runtime adapter; provider_config only provides the concrete upstream node settings for that adapter.

For OpenAI-compatible upstreams, provider can be openai-compatible/openai_compatible or a LiteLLM-style alias (e.g. groq, mistral, deepseek, qwen, together, fireworks, xai, perplexity, openrouter, ollama, azure).

Routing (optional):

router.default_backends: weighted primary selection (seeded by x-request-id when proxying)
router.rules[].backends: per-model-prefix weighted backends (falls back to router.default_backends when empty)
If multiple backends are selected, the OpenAI-compatible proxy will fall back to the next backend on network errors.
With --features gateway-routing-advanced, proxying can also use typed retry/fallback policies for status/network/timeout failures, circuit breaker controls, and active health checks (--proxy-retry* / --proxy-fallback-status-codes / --proxy-network-error-action / --proxy-timeout-error-action / --proxy-circuit-breaker* / --proxy-cb-failure-status-codes / --proxy-health-check*).
For non-safe HTTP methods, Ditto only continues to the next backend when the client explicitly supplies x-request-id; otherwise a safety guard stops cross-backend retry/fallback to reduce duplicate side effects.
That guard is not a distributed dedup store. If you need true end-to-end idempotency, enforce it in the upstream application or add request-result dedup persistence at the gateway boundary.

Endpoints:

OpenAI-compatible proxy (passthrough): ANY /v1/* (e.g. POST /v1/responses, POST /v1/chat/completions, GET /v1/models).
- LiteLLM-style aliases without a /v1 prefix are accepted (e.g. /chat/completions, /embeddings, /moderations, /files/*, /batches/*, /models/*, /responses/*).
- OpenAI-compatible /v1/*, MCP /mcp*, and A2A /a2a/* surfaces are fail-closed: requests must include a configured virtual key via Authorization: Bearer <virtual_key> (or x-ditto-virtual-key / x-api-key).
- The client Authorization header is treated as a virtual key and is not forwarded upstream; the backend headers are applied instead.
- An empty virtual_keys set means there are no valid client credentials yet, so those surfaces will return 401 until keys are provisioned.
- If the upstream does not implement POST /v1/responses (returns 404/405/501), Ditto will fall back to POST /v1/chat/completions and return a best-effort Responses-like response/stream (adds x-ditto-shim: responses_via_chat_completions).
OpenAI-compatible translation (feature gateway-translation): GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos* (create/list/retrieve/delete/content/remix), POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches can be served by a backend with provider configured (adds x-ditto-translation: <backend>; GET /v1/models only lists translation models routable for the current virtual key/router path; translated /v1/responses/* retrieve/delete are best-effort, require gateway-scoped ids created by the same running gateway instance, and currently live in a bounded in-memory LRU store).
Control-plane demo endpoint: POST /v1/gateway (JSON GatewayRequest; accepts Authorization: Bearer <virtual_key>).
GET /health
GET /ready
GET /metrics
GET /admin/keys (admin token via Authorization or x-admin-token if configured). Defaults to redacted tokens; ?include_tokens=true requires a write or tenant-write admin token and is rejected after keys have been reloaded from one-way hashed persistence.
GET /admin/config/version, GET /admin/config/versions, and GET /admin/config/versions/:version_id (current/process-local history/detail for control-plane virtual-key config versions; restart rebuilds history from the loaded config as a new bootstrap snapshot; detail supports ?include_tokens=true for secret-managing admins only while original secrets are still in memory).
GET /admin/config/diff (read-only or write admin token; compares two config versions via from_version_id + to_version_id; include_tokens requires secret-managing admin access and is rejected once only hashed tokens remain).
GET /admin/config/export (read-only or write admin token; exports current config by default, or a specific version via version_id; include_tokens requires secret-managing admin access and is rejected once only hashed tokens remain).
POST /admin/config/validate (read-only or write admin token; validates virtual_keys plus optional router payloads with optional expected hashes, without mutating runtime state).
PUT /admin/config/router (write admin token required; updates router config with backend-reference validation and creates a new config version; supports dry_run).
MCP tool gateway: ANY /mcp* (JSON-RPC tools/list / tools/call + convenience endpoints), and MCP tool integration for POST /v1/chat/completions and POST /v1/responses via tools: [{"type":"mcp", ...}] (requires a valid virtual key).
A2A agent gateway: GET /a2a/:agent_id/.well-known/agent-card.json and POST /a2a/* JSON-RPC proxying (requires a2a_agents configured and a valid virtual key).
POST /admin/keys and PUT|DELETE /admin/keys/:id (requires the write admin token).
POST /admin/config/rollback (requires the write admin token; restores virtual keys and router to a previous config version; supports dry_run).
LiteLLM-style key management (requires admin auth): /key/generate, /key/update, /key/regenerate (or /key/:key/regenerate), /key/delete, /key/info, /key/list.
- /key/list returns key aliases by default; include_tokens=true requires a write or tenant-write admin token and is rejected after keys have been reloaded from one-way hashed persistence.
- /key/info accepts ?key=... (admin query) or defaults to the Authorization: Bearer <virtual_key> token when ?key is omitted (self lookup).
POST /admin/proxy_cache/purge (requires the write admin token and --proxy-cache; body can be { \"cache_key\": \"...\" } or { \"all\": true }).
GET /admin/backends and POST /admin/backends/:name/reset (reset requires the write admin token and --features gateway-routing-advanced).

CLI options:

--listen HOST:PORT (or --addr HOST:PORT) sets the bind address (default: 127.0.0.1:8080).
--dotenv PATH loads a dotenv file (KEY=VALUE) for ${ENV_VAR} interpolation and provider auth env lookups.
--admin-token TOKEN enables /admin/* endpoints (write admin token).
--admin-token-env ENV loads the write admin token from env (works with --dotenv).
--admin-read-token TOKEN enables /admin/* read-only endpoints.
--admin-read-token-env ENV loads the read-only admin token from env (works with --dotenv).
--backend name=url adds/overrides a backend for POST /v1/gateway (the backend is a URL that accepts GatewayRequest JSON and returns GatewayResponse JSON).
--upstream name=base_url adds/overrides an OpenAI-compatible upstream backend (in addition to gateway.json).
--state PATH enables persistence for admin config mutations (virtual_keys + router in GatewayStateFile; loaded on startup; created from gateway.json when missing). Virtual-key tokens are written as one-way sha256: hashes.
--sqlite PATH enables sqlite persistence for admin config mutations (virtual_keys + router; requires --features gateway-store-sqlite; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
--pg URL / --pg-env ENV enables postgres persistence for admin config mutations (virtual_keys + router) plus audit/budget/cost ledgers (/admin/audit*, /admin/budgets*, /admin/costs*; costs require gateway-costing; requires --features gateway-store-postgres; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
--mysql URL / --mysql-env ENV enables mysql persistence for admin config mutations (virtual_keys + router) plus audit/budget/cost ledgers (/admin/audit*, /admin/budgets*, /admin/costs*; costs require gateway-costing; requires --features gateway-store-mysql; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
--redis URL enables redis persistence for admin config mutations (virtual_keys + router; requires --features gateway-store-redis). Virtual-key tokens are written as one-way sha256: hashes.
After a restart from any persisted sha256: state/store, Ditto can still authenticate presented virtual-key tokens, but include_tokens=true exports can no longer return the original secret material.
--redis-env ENV loads the redis URL from env (works with --dotenv; requires --features gateway-store-redis).
--redis-prefix PREFIX sets the redis key prefix (requires --features gateway-store-redis and --redis/--redis-env).
--audit-retention-secs SECS sets audit retention for sqlite/pg/mysql/redis stores (0 disables retention; default is 30 days when any persistent store is configured).
--db-doctor runs store schema checks and exits (startup also performs schema self-check and fails fast on mismatch).
--json-logs emits JSON log records to stderr.
--proxy-max-in-flight N limits concurrent in-flight proxy requests (rejects with 429 when exceeded). If omitted, default is 256.
--proxy-cache enables a best-effort cache for non-streaming OpenAI-compatible responses (requires --features gateway-proxy-cache). When combined with --redis, responses are also cached in Redis (shared across instances).
--proxy-cache-ttl SECS sets the proxy cache TTL (implies --proxy-cache).
--proxy-cache-max-entries N sets the in-memory proxy cache capacity (implies --proxy-cache).
--proxy-cache-max-body-bytes N sets the maximum cached body size per entry (implies --proxy-cache).
--proxy-cache-max-total-body-bytes N sets the in-memory total cached body budget (implies --proxy-cache).
--proxy-retry enables retry on retryable statuses (requires --features gateway-routing-advanced).
--proxy-retry-status-codes CODES overrides retry status codes (comma-separated; implies --proxy-retry).
--proxy-fallback-status-codes CODES falls back to the next backend when a response status matches (comma-separated; works even when retry is disabled).
--proxy-network-error-action ACTION controls what to do on transport failures (none, fallback, retry; default: fallback).
--proxy-timeout-error-action ACTION controls what to do on backend timeouts (none, fallback, retry; default: fallback).
For non-safe methods (POST/PUT/PATCH/DELETE and similar), cross-backend retry/fallback is guarded unless the client provided x-request-id; when blocked, Ditto emits proxy.request_safety_guard in JSON/devtools logs.
--proxy-retry-max-attempts N sets max retry attempts (implies --proxy-retry).
--proxy-circuit-breaker enables a simple circuit breaker (requires --features gateway-routing-advanced).
--proxy-cb-failure-threshold N sets circuit breaker failure threshold (implies --proxy-circuit-breaker).
--proxy-cb-cooldown-secs SECS sets circuit breaker cooldown seconds (implies --proxy-circuit-breaker).
--proxy-cb-failure-status-codes CODES adds extra status codes that should count toward the circuit breaker (for example 408,429).
--proxy-cb-no-network-errors, --proxy-cb-no-timeout-errors, --proxy-cb-no-server-errors disable individual circuit-breaker failure buckets.
--proxy-health-checks enables active health checks (requires --features gateway-routing-advanced).
--proxy-health-check-path PATH overrides the health check request path (implies --proxy-health-checks; default: /v1/models).
--proxy-health-check-interval-secs SECS sets health check interval seconds (implies --proxy-health-checks).
--proxy-health-check-timeout-secs SECS sets health check timeout seconds (implies --proxy-health-checks).
--pricing-litellm PATH loads LiteLLM-style pricing JSON for cost budgets (requires --features gateway-costing).
--prometheus-metrics enables a Prometheus metrics endpoint (requires --features gateway-metrics-prometheus).
--prometheus-max-key-series N limits per-key series cardinality (implies --prometheus-metrics).
--prometheus-max-model-series N limits per-model series cardinality (implies --prometheus-metrics).
--prometheus-max-backend-series N limits per-backend series cardinality (implies --prometheus-metrics).
--prometheus-max-path-series N limits per-path series cardinality (implies --prometheus-metrics).
--devtools PATH enables JSONL request/response logging (requires --features gateway-devtools).
--otel enables OpenTelemetry tracing export via OTLP (requires --features gateway-otel).
--otel-endpoint URL overrides the OTLP endpoint (implies --otel).
--otel-json enables JSON formatted tracing logs (implies --otel).

Response headers:

x-ditto-backend: which backend handled the request
x-ditto-request-id: request id (uses incoming x-request-id or generates one)
x-ditto-cache: hit when served from the optional proxy cache
x-ditto-cache-key: cache key for the optional proxy cache (when enabled and cacheable)
x-ditto-cache-source: memory or redis when x-ditto-cache=hit
x-ditto-shim: present when POST /v1/responses is shimmed via POST /v1/chat/completions
x-ditto-translation: present when a translation backend handled the request

Stream Collection

If you want to consume a streaming response but still produce a final unified GenerateResponse, use collect_stream:

use ditto_core::contracts::GenerateRequest;
use ditto_core::llm_core::model::LanguageModel;
use ditto_core::llm_core::stream::collect_stream;

let stream = llm.stream(GenerateRequest::from(messages)).await?;
let collected = collect_stream(stream).await?;
println!("{}", collected.response.text());

Text (generateText / streamText)

Single-step text helpers (no tool execution loop):

use ditto_core::capabilities::text::LanguageModelTextExt;
use ditto_core::contracts::GenerateRequest;

let out = llm.generate_text(GenerateRequest::from(messages)).await?;
println!("{}", out.text);

Streaming:

use futures_util::StreamExt;
use ditto_core::capabilities::text::LanguageModelTextExt;
use ditto_core::contracts::GenerateRequest;

let (handle, mut text_stream) = llm
    .stream_text(GenerateRequest::from(messages))
    .await?
    .into_text_stream();
while let Some(delta) = text_stream.next().await {
    print!("{}", delta?);
}
let final_text = handle.final_text()?.unwrap();
println!("\nfinal={final_text}");

Structured Output (generateObject / streamObject)

Use LanguageModelObjectExt to request structured output (AI SDK-style generateObject / streamObject).

Defaults (ObjectOptions::default()):

strategy = Auto:
- openai → JSON Schema via response_format (native)
- other providers (incl. openai-compatible) → tool-call enforced JSON (wraps output under {"value": ...})
- always falls back to extracting JSON from text if needed
output = Object (top-level object)

use ditto_core::capabilities::object::LanguageModelObjectExt;
use ditto_core::contracts::{GenerateRequest, Message};
use ditto_core::provider_options::JsonSchemaFormat;
use serde_json::json;

let schema = JsonSchemaFormat {
    name: "recipe".to_string(),
    schema: json!({ "type": "object" }),
    strict: None,
};

let out = llm
    .generate_object_json(GenerateRequest::from(vec![Message::user("hi")]), schema)
    .await?;

println!("{}", out.object);

Streaming (partial objects):

use futures_util::StreamExt;

let (handle, mut partial_object_stream) = llm
    .stream_object(GenerateRequest::from(messages), schema)
    .await?
    .into_partial_stream();
while let Some(partial) = partial_object_stream.next().await {
    println!("{:?}", partial?);
}
let final_obj = handle.final_json()?.unwrap();
println!("{final_obj}");

Streaming arrays (AI SDK elementStream):

use ditto_core::capabilities::object::{ObjectOptions, ObjectOutput};
use futures_util::StreamExt;

let mut result = llm
    .stream_object_with(
        GenerateRequest::from(messages),
        schema, // schema for a single element; ditto wraps it as {type:"array", items: ...}
        ObjectOptions {
            output: ObjectOutput::Array,
            ..ObjectOptions::default()
        },
    )
    .await?;

while let Some(element) = result.element_stream.next().await {
    println!("element = {}", element?);
}

Streaming Cancellation

If you need an explicit abort handle (instead of relying on drop semantics), wrap the stream:

use ditto_core::contracts::GenerateRequest;
use ditto_core::llm_core::model::LanguageModel;
use ditto_core::llm_core::stream::abortable_stream;

let stream = llm.stream(GenerateRequest::from(messages)).await?;
let abortable = abortable_stream(stream);
abortable.handle.abort();

Embeddings

EmbeddingModelExt provides AI SDK-style aliases:

use ditto_core::capabilities::EmbeddingModelExt;

let vectors = embeddings.embed_many(vec!["hello".to_string(), "world".to_string()]).await?;
let one = embeddings.embed_one("hi".to_string()).await?;

Custom HTTP Client

Providers accept a custom reqwest::Client so you can configure timeouts, proxies, and default headers (e.g. enterprise gateways):

let http = reqwest::Client::builder().build()?;
let llm = ditto_core::providers::OpenAI::new(api_key).with_http_client(http);

When building providers from config, you can also set per-node default headers via ProviderConfig.http_headers.

Provider Auth (Custom Headers / Query Params)

Providers apply their standard auth headers by default (OpenAI/OpenAI-compatible: bearer token; Anthropic: x-api-key; Google: x-goog-api-key).

If you need a non-standard auth header (e.g. Azure / enterprise gateways), use:

auth = { type = "http_header_env", header = "api-key", keys = ["AZURE_OPENAI_API_KEY"] }

If your gateway expects auth in a query param (e.g. ...?api_key=...), use:

auth = { type = "query_param_env", param = "api_key", keys = ["GATEWAY_API_KEY"] }

If you need to fetch a token dynamically (e.g. gcloud auth print-access-token, aws-vault, Vault CLI), use:

auth = { type = "command", command = ["gcloud", "auth", "print-access-token"] }

The command stdout may be a plain token, a JSON string ("sk-..."), or a JSON object with api_key/token/access_token. Ditto enforces a 15s timeout (configurable via DITTO_AUTH_COMMAND_TIMEOUT_MS/SECS) and a 64KiB stdout/stderr cap.

Provider Node Query Params (Optional)

If your provider requires additional fixed query params on every request (e.g. Azure OpenAI api-version), set ProviderConfig.http_query_params:

base_url = "https://{resource}.openai.azure.com/openai/deployments/{deployment}"
http_query_params = { "api-version" = "2024-02-01" }
auth = { type = "http_header_env", header = "api-key", keys = ["AZURE_OPENAI_API_KEY"] }

Provider Options (Per Provider)

Requests that support provider_options accept either:

Legacy (flat): a single JSON object applied to the current provider.
Bucketed: a JSON object keyed by provider id (optionally with a "*" default bucket).

Bucketed example:

{
  "provider_options": {
    "*": { "parallel_tool_calls": false },
    "openai": { "reasoning_effort": "high" },
    "openai-compatible": { "response_format": { "type": "json_schema", "json_schema": { "name": "answer", "schema": { "type": "object" } } } }
  }
}

Precedence is "*" (base) → provider bucket (override). Provider ids are: openai, openai-compatible (also accepts openai_compatible as an alias key), anthropic, google, cohere, bedrock, vertex.

File Upload (Optional)

If you want to send PDFs via file_id (instead of inlining base64), OpenAI and OpenAI-compatible providers expose a small upload helper:

let file_id = llm.upload_file("doc.pdf", pdf_bytes).await?;

Development

Enable repo-local git hooks:

git config core.hooksPath githooks

This enforces Conventional Commits and requires each commit to include CHANGELOG.md.

Structure Gates

默认结构 gate 以 Rust 主线为准，目标是让“默认 core + all-features + no-default-features + provider feature matrix”都持续可构建、可 lint。对应的本地最小命令集：

cargo fmt --all -- --check
cargo run -p ditto-core --bin ditto-llms-txt -- --check
cargo check --workspace
cargo test --workspace --all-targets
cargo check -p ditto-core --examples
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace --all-targets --all-features
cargo check -p ditto-core --no-default-features
cargo clippy -p ditto-core --no-default-features -- -D warnings
cargo check -p ditto-server --no-default-features
cargo clippy -p ditto-server --no-default-features -- -D warnings

Node 默认只验证 packages/*：

pnpm run typecheck
pnpm run build

可选 Admin UI 资产单独验证：

pnpm run typecheck:admin-ui
pnpm run build:admin-ui

Integration Tests (Optional)

Enable the integration feature and set real credentials:

OpenAI Responses: OPENAI_API_KEY + OPENAI_MODEL
OpenAI-compatible: OPENAI_COMPAT_BASE_URL + OPENAI_COMPAT_MODEL (+ OPENAI_COMPAT_API_KEY optional)

Then run:

cargo test --all-features

Name		Name	Last commit message	Last commit date
Latest commit History 388 Commits
.cargo-home		.cargo-home
.cargo		.cargo
.github/workflows		.github/workflows
apps/admin-ui		apps/admin-ui
catalog		catalog
contracts		contracts
coverage		coverage
crates		crates
deploy		deploy
docs		docs
examples/clients		examples/clients
githooks		githooks
packages		packages
scripts		scripts
tmp		tmp
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENT.md		AGENT.md
AGENTS.md		AGENTS.md
CATALOG_COMPLETENESS.md		CATALOG_COMPLETENESS.md
CHANGELOG.md		CHANGELOG.md
COMPARED_TO_LITELLM_AI_SDK.md		COMPARED_TO_LITELLM_AI_SDK.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
PROVIDERS.md		PROVIDERS.md
README.md		README.md
TODO.md		TODO.md
llms.txt		llms.txt
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ditto-llm

Docs

Provider Packs and Capability Packs

Tool Schemas

Examples

Gateway (optional)

Stream Collection

Text (generateText / streamText)

Structured Output (generateObject / streamObject)

Streaming Cancellation

Embeddings

Custom HTTP Client

Provider Auth (Custom Headers / Query Params)

Provider Node Query Params (Optional)

Provider Options (Per Provider)

File Upload (Optional)

Development

Structure Gates

Integration Tests (Optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ditto-llm

Docs

Provider Packs and Capability Packs

Tool Schemas

Examples

Gateway (optional)

Stream Collection

Text (generateText / streamText)

Structured Output (generateObject / streamObject)

Streaming Cancellation

Embeddings

Custom HTTP Client

Provider Auth (Custom Headers / Query Params)

Provider Node Query Params (Optional)

Provider Options (Per Provider)

File Upload (Optional)

Development

Structure Gates

Integration Tests (Optional)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages