Skip to content

omne42/ditto-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

388 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ditto-llm

Ditto-LLM is a small Rust SDK that provides a unified interface for calling multiple LLM providers.

Goal: become a superset of LiteLLM Proxy + Vercel AI SDK Core via layering + Cargo feature gating. See COMPARED_TO_LITELLM_AI_SDK.md and TODO.md for the parity notes and roadmap.

Layered product plan (L0/L1/L2):

  • L0 (this repo): model adapters + protocol/shape conversion + direct SDK usage.
  • L1 (this repo): gateway/proxy platform (API surface, routing, budgets, observability, admin).
  • L2 (separate repo): enterprise closed-loop platform (prompt/eval/agent eval/org governance).
  • Boundary: L2 depends on L1 contracts; L1 remains independently deployable for SMB/mid-market.
  • Frozen L1 contract artifacts: contracts/gateway-contract-v0.1.openapi.yaml + crates/ditto-gateway-contract-types.

Current scope:

  • Default build: generic OpenAI-compatible LLM core (provider-openai-compatible + cap-llm). This is the stable base and only default capability promise.
  • Unified LLM types + traits: LanguageModel, Message/ContentPart, Tool, StreamChunk, Warning.
  • Text helpers: generate_text / stream_text (AI SDK-style generateText / streamText).
  • Structured outputs: generate_object_json / stream_object (AI SDK-style generateObject / streamObject).
  • Multi-modal inputs at the request shape level: images + PDF documents via ContentPart::Image / ContentPart::File (provider support varies; unsupported parts emit Warning).
  • Parameter hygiene: temperature/top_p are clamped to provider ranges; non-finite values are dropped (with warnings).
  • Default provider path: OpenAI-compatible Chat Completions (LiteLLM / DeepSeek / Qwen / OpenRouter / local gateways / etc.) with generate + SSE streaming + tools.
  • Optional provider packs and capability packs add official OpenAI Responses, embeddings, images, audio, moderations, Google GenAI, Anthropic Messages, Cohere, Bedrock, Vertex, batches, rerank, and gateway translation surfaces.
  • Provider profile config and model discovery (ProviderConfig / GET /models) remain available for routing use-cases, but the default examples and docs now assume a generic OpenAI-compatible upstream.

Optional feature-gated modules:

  • Agent tool loop: ToolLoopAgent + ToolExecutor (feature agent).
  • Auth adapters: SigV4 signer + OAuth client-credentials flow (feature auth).
  • Providers: Bedrock (SigV4) and Vertex (OAuth) adapters with generate + SSE streaming + tools (features provider-bedrock, provider-vertex).
  • SDK utilities: stream protocol v1, HTTP adapters (SSE/NDJSON), telemetry sink, devtools JSONL logger, MCP tool adapter, cache middleware with streaming replay (feature sdk).
  • SDK HTTP helpers: optional axum response builders for stream adapters (feature sdk-axum).
  • Gateway control-plane: virtual keys, limits, cache, budget, routing, guardrails, passthrough, plus a ditto-gateway HTTP server (feature gateway). Includes LiteLLM-like conveniences such as /key/* endpoints, /a2a/* agent proxy, and /mcp* MCP tool gateway.
  • Gateway token counting: tiktoken-based input token estimation for proxy budgets/guardrails/costing (feature gateway-tokenizer).
  • Gateway translation proxy: OpenAI-compatible GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos* (create/list/retrieve/delete/content/remix), POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches backed by Ditto providers (feature gateway-translation).
  • Gateway proxy caching: in-memory cache for non-streaming OpenAI-compatible responses (feature gateway-proxy-cache).
  • Gateway OpenTelemetry: OTLP tracing exporter + structured logs for gateway HTTP requests (feature gateway-otel).

Non-goals (for now):

  • The default build is not an API gateway/proxy; the gateway feature adds a lightweight control-plane + HTTP service. The gateway-translation feature adds translation for GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos*, POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches. Full OpenAI surface translation (etc) is tracked in TODO.md.
  • Core helpers are single-step and return tool calls to the caller; the agent feature offers an opt-in tool loop, but it is not enabled by default.
  • It is not a full UI SDK (no frontend hooks or middleware ecosystem); the sdk feature only provides protocol/telemetry/devtools/MCP utilities.
  • Bedrock support targets Anthropic Messages-on-Bedrock; other Bedrock model families and Vertex service-account JWT flows are not covered yet.

See PROVIDERS.md for a pragmatic provider/capability matrix (native adapters + OpenAI-compatible gateway coverage).

Docs

This repo includes an mdBook under docs/. For the stable docs entrypoints, start with docs/README.md and docs/docs-system-map.md. Use ./scripts/check-docs-system.sh to verify the repository-level docs skeleton.

cargo install mdbook
mdbook serve docs

If you don’t want to install mdBook, you can still read the Markdown directly in docs/src.

Provider Packs and Capability Packs

Ditto now documents provider integration around three separate axes:

  • Default core: provider-openai-compatible + cap-llm is the only out-of-the-box contract.
  • Provider packs: provider-openai, provider-anthropic, provider-google, provider-cohere, provider-bedrock, provider-vertex, plus provider-specific packs such as provider-deepseek, provider-kimi, and provider-openrouter.
  • Capability packs: cap-llm, cap-embedding, cap-image-generation, cap-image-edit, cap-audio-transcription, cap-audio-speech, cap-moderation, cap-rerank, cap-batch, cap-realtime.

The intended boundary is:

  • provider selects the runtime adapter/provider pack.
  • ProviderConfig configures one concrete upstream node for that runtime.
  • GenerateRequest.provider_options stays request-scoped.

See PROVIDERS.md for the provider × capability × feature × status table.

Tool Schemas

For Google function calling, Ditto-LLM converts tool parameter JSON Schema into an OpenAPI-style schema.

Contract:

  • Conversion is best-effort and lossy: unsupported keywords are ignored (dropped), not errors.
  • Unsupported keywords may emit Warning::Compatibility(tool.parameters.unsupported_keywords) to avoid silent data loss.
  • $ref is best-effort: local refs (#/...) are resolved; unresolvable refs are ignored and a Warning::Compatibility(tool.parameters.$ref) is emitted.
  • Root empty-object schemas (no properties + additionalProperties missing/false) are treated as "no parameters" and omitted.
  • Boolean schemas (true/false) are treated as unconstrained schemas; at the root they are omitted.
  • Nullable unions:
    • type: ["string", "null"] becomes anyOf: [{ "type": "string" }] + nullable: true
    • anyOf: [{...}, {"type":"null"}] becomes the same shape (single branch is flattened)
  • const becomes enum: [<const>].
  • additionalProperties supports boolean and nested schemas.

Supported keywords (subset): type, title, description, properties, required, items, additionalProperties, enum, const, format, allOf, anyOf, oneOf, default, minLength/maxLength/pattern, minItems/maxItems/uniqueItems, minProperties/maxProperties, minimum/maximum/multipleOf, and exclusiveMinimum/exclusiveMaximum (number form → minimum/maximum + exclusive* = true).

Examples

Default-core examples expect a generic OpenAI-compatible upstream:

export OPENAI_COMPAT_BASE_URL="https://your-openai-compatible-endpoint/v1"
export OPENAI_COMPAT_MODEL="your-chat-model"
export OPENAI_COMPAT_API_KEY="sk-..."   # optional for local gateways that do not require auth

cargo run --example basic
cargo run --example streaming
cargo run --example tool_calling
cargo run --example openai_compatible

Additional provider/capability examples stay opt-in:

cargo run --example openai_compatible_embeddings --features cap-embedding
cargo run --example embeddings --features "provider-openai cap-embedding"
cargo run --example multimodal --features "provider-openai cap-llm base64" -- ./image.png ./doc.pdf
cargo run --example batches --features "provider-openai-compatible cap-batch" -- ./requests.jsonl

Gateway (optional)

Run the HTTP gateway (feature gateway):

cargo run -p ditto-server --features gateway --bin ditto-gateway -- ./gateway.json --listen 0.0.0.0:8080

YAML config is optional (feature gateway-config-yaml):

cargo run --features gateway-config-yaml --bin ditto-gateway -- ./gateway.yaml --listen 0.0.0.0:8080

Optional admin UI asset (React; outside the default core build/CI path):

pnpm install
pnpm run dev:admin-ui

Minimal multi-language gateway clients:

  • Node (SSE streaming): examples/clients/node/stream_chat_completions.mjs
  • Python: examples/clients/python/chat_completions.py
  • Go: examples/clients/go/chat_completions.go

Backends are configured in gateway.json (OpenAI-compatible upstreams + injected headers/query params, e.g. Authorization and Azure-style api-version):

{
  "backends": [
    {
      "name": "primary",
      "base_url": "https://api.openai.com/v1",
      "max_in_flight": 64,
      "timeout_seconds": 60,
      "headers": { "authorization": "Bearer ${OPENAI_API_KEY}" },
      "query_params": {}
    }
  ],
  "virtual_keys": [
    {
      "id": "local-dev",
      "token": "${DITTO_VIRTUAL_KEY}",
      "enabled": true,
      "limits": {},
      "budget": {},
      "cache": {},
      "guardrails": {},
      "passthrough": {},
      "route": null
    }
  ],
  "router": { "default_backends": [{ "backend": "primary", "weight": 1.0 }], "rules": [] }
}

backends[].max_in_flight optionally caps concurrent in-flight proxy requests per backend (rejects with HTTP 429 + OpenAI-style error code inflight_limit_backend). backends[].timeout_seconds optionally overrides the backend request timeout in seconds (default: 300s).

Gateway config supports ${ENV_VAR} interpolation in backend base_url/headers/query_params, backend provider_config node fields (for example base_url, default_model, http_headers, http_query_params, auth, upstream_api, normalize_to, normalize_endpoint), virtual_keys[].token, a2a_agents[] (agent url/headers/query), and mcp_servers[] (server url/headers/query) (expanded at startup via the process env or --dotenv).

Translation backends (feature gateway-translation) can be configured with provider + provider_config (same shape as ProviderConfig):

{
  "backends": [
    {
      "name": "anthropic",
      "provider": "anthropic",
      "provider_config": {
        "auth": { "type": "api_key_env", "keys": ["ANTHROPIC_API_KEY"] },
        "default_model": "claude-3-5-sonnet-20241022"
      }
    }
  ],
  "virtual_keys": [
    {
      "id": "local-dev",
      "token": "${DITTO_VIRTUAL_KEY}",
      "enabled": true,
      "limits": {},
      "budget": {},
      "cache": {},
      "guardrails": {},
      "passthrough": {},
      "route": null
    }
  ],
  "router": { "default_backends": [{ "backend": "anthropic", "weight": 1.0 }], "rules": [] }
}

provider selects the runtime adapter; provider_config only provides the concrete upstream node settings for that adapter.

For OpenAI-compatible upstreams, provider can be openai-compatible/openai_compatible or a LiteLLM-style alias (e.g. groq, mistral, deepseek, qwen, together, fireworks, xai, perplexity, openrouter, ollama, azure).

Routing (optional):

  • router.default_backends: weighted primary selection (seeded by x-request-id when proxying)
  • router.rules[].backends: per-model-prefix weighted backends (falls back to router.default_backends when empty)
  • If multiple backends are selected, the OpenAI-compatible proxy will fall back to the next backend on network errors.
  • With --features gateway-routing-advanced, proxying can also use typed retry/fallback policies for status/network/timeout failures, circuit breaker controls, and active health checks (--proxy-retry* / --proxy-fallback-status-codes / --proxy-network-error-action / --proxy-timeout-error-action / --proxy-circuit-breaker* / --proxy-cb-failure-status-codes / --proxy-health-check*).
  • For non-safe HTTP methods, Ditto only continues to the next backend when the client explicitly supplies x-request-id; otherwise a safety guard stops cross-backend retry/fallback to reduce duplicate side effects.
  • That guard is not a distributed dedup store. If you need true end-to-end idempotency, enforce it in the upstream application or add request-result dedup persistence at the gateway boundary.

Endpoints:

  • OpenAI-compatible proxy (passthrough): ANY /v1/* (e.g. POST /v1/responses, POST /v1/chat/completions, GET /v1/models).
    • LiteLLM-style aliases without a /v1 prefix are accepted (e.g. /chat/completions, /embeddings, /moderations, /files/*, /batches/*, /models/*, /responses/*).
    • OpenAI-compatible /v1/*, MCP /mcp*, and A2A /a2a/* surfaces are fail-closed: requests must include a configured virtual key via Authorization: Bearer <virtual_key> (or x-ditto-virtual-key / x-api-key).
    • The client Authorization header is treated as a virtual key and is not forwarded upstream; the backend headers are applied instead.
    • An empty virtual_keys set means there are no valid client credentials yet, so those surfaces will return 401 until keys are provisioned.
    • If the upstream does not implement POST /v1/responses (returns 404/405/501), Ditto will fall back to POST /v1/chat/completions and return a best-effort Responses-like response/stream (adds x-ditto-shim: responses_via_chat_completions).
  • OpenAI-compatible translation (feature gateway-translation): GET /v1/models, GET /v1/models/*, POST /v1/chat/completions, POST /v1/completions, POST /v1/responses, POST /v1/responses/compact, POST /v1/responses/input_tokens, GET /v1/responses/*, GET /v1/responses/*/input_items, DELETE /v1/responses/*, POST /v1/embeddings, POST /v1/moderations, POST /v1/images/generations, /v1/videos* (create/list/retrieve/delete/content/remix), POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech, /v1/files*, POST /v1/rerank, and /v1/batches can be served by a backend with provider configured (adds x-ditto-translation: <backend>; GET /v1/models only lists translation models routable for the current virtual key/router path; translated /v1/responses/* retrieve/delete are best-effort, require gateway-scoped ids created by the same running gateway instance, and currently live in a bounded in-memory LRU store).
  • Control-plane demo endpoint: POST /v1/gateway (JSON GatewayRequest; accepts Authorization: Bearer <virtual_key>).
  • GET /health
  • GET /ready
  • GET /metrics
  • GET /admin/keys (admin token via Authorization or x-admin-token if configured). Defaults to redacted tokens; ?include_tokens=true requires a write or tenant-write admin token and is rejected after keys have been reloaded from one-way hashed persistence.
  • GET /admin/config/version, GET /admin/config/versions, and GET /admin/config/versions/:version_id (current/process-local history/detail for control-plane virtual-key config versions; restart rebuilds history from the loaded config as a new bootstrap snapshot; detail supports ?include_tokens=true for secret-managing admins only while original secrets are still in memory).
  • GET /admin/config/diff (read-only or write admin token; compares two config versions via from_version_id + to_version_id; include_tokens requires secret-managing admin access and is rejected once only hashed tokens remain).
  • GET /admin/config/export (read-only or write admin token; exports current config by default, or a specific version via version_id; include_tokens requires secret-managing admin access and is rejected once only hashed tokens remain).
  • POST /admin/config/validate (read-only or write admin token; validates virtual_keys plus optional router payloads with optional expected hashes, without mutating runtime state).
  • PUT /admin/config/router (write admin token required; updates router config with backend-reference validation and creates a new config version; supports dry_run).
  • MCP tool gateway: ANY /mcp* (JSON-RPC tools/list / tools/call + convenience endpoints), and MCP tool integration for POST /v1/chat/completions and POST /v1/responses via tools: [{"type":"mcp", ...}] (requires a valid virtual key).
  • A2A agent gateway: GET /a2a/:agent_id/.well-known/agent-card.json and POST /a2a/* JSON-RPC proxying (requires a2a_agents configured and a valid virtual key).
  • POST /admin/keys and PUT|DELETE /admin/keys/:id (requires the write admin token).
  • POST /admin/config/rollback (requires the write admin token; restores virtual keys and router to a previous config version; supports dry_run).
  • LiteLLM-style key management (requires admin auth): /key/generate, /key/update, /key/regenerate (or /key/:key/regenerate), /key/delete, /key/info, /key/list.
    • /key/list returns key aliases by default; include_tokens=true requires a write or tenant-write admin token and is rejected after keys have been reloaded from one-way hashed persistence.
    • /key/info accepts ?key=... (admin query) or defaults to the Authorization: Bearer <virtual_key> token when ?key is omitted (self lookup).
  • POST /admin/proxy_cache/purge (requires the write admin token and --proxy-cache; body can be { \"cache_key\": \"...\" } or { \"all\": true }).
  • GET /admin/backends and POST /admin/backends/:name/reset (reset requires the write admin token and --features gateway-routing-advanced).

CLI options:

  • --listen HOST:PORT (or --addr HOST:PORT) sets the bind address (default: 127.0.0.1:8080).
  • --dotenv PATH loads a dotenv file (KEY=VALUE) for ${ENV_VAR} interpolation and provider auth env lookups.
  • --admin-token TOKEN enables /admin/* endpoints (write admin token).
  • --admin-token-env ENV loads the write admin token from env (works with --dotenv).
  • --admin-read-token TOKEN enables /admin/* read-only endpoints.
  • --admin-read-token-env ENV loads the read-only admin token from env (works with --dotenv).
  • --backend name=url adds/overrides a backend for POST /v1/gateway (the backend is a URL that accepts GatewayRequest JSON and returns GatewayResponse JSON).
  • --upstream name=base_url adds/overrides an OpenAI-compatible upstream backend (in addition to gateway.json).
  • --state PATH enables persistence for admin config mutations (virtual_keys + router in GatewayStateFile; loaded on startup; created from gateway.json when missing). Virtual-key tokens are written as one-way sha256: hashes.
  • --sqlite PATH enables sqlite persistence for admin config mutations (virtual_keys + router; requires --features gateway-store-sqlite; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
  • --pg URL / --pg-env ENV enables postgres persistence for admin config mutations (virtual_keys + router) plus audit/budget/cost ledgers (/admin/audit*, /admin/budgets*, /admin/costs*; costs require gateway-costing; requires --features gateway-store-postgres; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
  • --mysql URL / --mysql-env ENV enables mysql persistence for admin config mutations (virtual_keys + router) plus audit/budget/cost ledgers (/admin/audit*, /admin/budgets*, /admin/costs*; costs require gateway-costing; requires --features gateway-store-mysql; loaded on startup). Virtual-key tokens are written as one-way sha256: hashes.
  • --redis URL enables redis persistence for admin config mutations (virtual_keys + router; requires --features gateway-store-redis). Virtual-key tokens are written as one-way sha256: hashes.
  • After a restart from any persisted sha256: state/store, Ditto can still authenticate presented virtual-key tokens, but include_tokens=true exports can no longer return the original secret material.
  • --redis-env ENV loads the redis URL from env (works with --dotenv; requires --features gateway-store-redis).
  • --redis-prefix PREFIX sets the redis key prefix (requires --features gateway-store-redis and --redis/--redis-env).
  • --audit-retention-secs SECS sets audit retention for sqlite/pg/mysql/redis stores (0 disables retention; default is 30 days when any persistent store is configured).
  • --db-doctor runs store schema checks and exits (startup also performs schema self-check and fails fast on mismatch).
  • --json-logs emits JSON log records to stderr.
  • --proxy-max-in-flight N limits concurrent in-flight proxy requests (rejects with 429 when exceeded). If omitted, default is 256.
  • --proxy-cache enables a best-effort cache for non-streaming OpenAI-compatible responses (requires --features gateway-proxy-cache). When combined with --redis, responses are also cached in Redis (shared across instances).
  • --proxy-cache-ttl SECS sets the proxy cache TTL (implies --proxy-cache).
  • --proxy-cache-max-entries N sets the in-memory proxy cache capacity (implies --proxy-cache).
  • --proxy-cache-max-body-bytes N sets the maximum cached body size per entry (implies --proxy-cache).
  • --proxy-cache-max-total-body-bytes N sets the in-memory total cached body budget (implies --proxy-cache).
  • --proxy-retry enables retry on retryable statuses (requires --features gateway-routing-advanced).
  • --proxy-retry-status-codes CODES overrides retry status codes (comma-separated; implies --proxy-retry).
  • --proxy-fallback-status-codes CODES falls back to the next backend when a response status matches (comma-separated; works even when retry is disabled).
  • --proxy-network-error-action ACTION controls what to do on transport failures (none, fallback, retry; default: fallback).
  • --proxy-timeout-error-action ACTION controls what to do on backend timeouts (none, fallback, retry; default: fallback).
  • For non-safe methods (POST/PUT/PATCH/DELETE and similar), cross-backend retry/fallback is guarded unless the client provided x-request-id; when blocked, Ditto emits proxy.request_safety_guard in JSON/devtools logs.
  • --proxy-retry-max-attempts N sets max retry attempts (implies --proxy-retry).
  • --proxy-circuit-breaker enables a simple circuit breaker (requires --features gateway-routing-advanced).
  • --proxy-cb-failure-threshold N sets circuit breaker failure threshold (implies --proxy-circuit-breaker).
  • --proxy-cb-cooldown-secs SECS sets circuit breaker cooldown seconds (implies --proxy-circuit-breaker).
  • --proxy-cb-failure-status-codes CODES adds extra status codes that should count toward the circuit breaker (for example 408,429).
  • --proxy-cb-no-network-errors, --proxy-cb-no-timeout-errors, --proxy-cb-no-server-errors disable individual circuit-breaker failure buckets.
  • --proxy-health-checks enables active health checks (requires --features gateway-routing-advanced).
  • --proxy-health-check-path PATH overrides the health check request path (implies --proxy-health-checks; default: /v1/models).
  • --proxy-health-check-interval-secs SECS sets health check interval seconds (implies --proxy-health-checks).
  • --proxy-health-check-timeout-secs SECS sets health check timeout seconds (implies --proxy-health-checks).
  • --pricing-litellm PATH loads LiteLLM-style pricing JSON for cost budgets (requires --features gateway-costing).
  • --prometheus-metrics enables a Prometheus metrics endpoint (requires --features gateway-metrics-prometheus).
  • --prometheus-max-key-series N limits per-key series cardinality (implies --prometheus-metrics).
  • --prometheus-max-model-series N limits per-model series cardinality (implies --prometheus-metrics).
  • --prometheus-max-backend-series N limits per-backend series cardinality (implies --prometheus-metrics).
  • --prometheus-max-path-series N limits per-path series cardinality (implies --prometheus-metrics).
  • --devtools PATH enables JSONL request/response logging (requires --features gateway-devtools).
  • --otel enables OpenTelemetry tracing export via OTLP (requires --features gateway-otel).
  • --otel-endpoint URL overrides the OTLP endpoint (implies --otel).
  • --otel-json enables JSON formatted tracing logs (implies --otel).

Response headers:

  • x-ditto-backend: which backend handled the request
  • x-ditto-request-id: request id (uses incoming x-request-id or generates one)
  • x-ditto-cache: hit when served from the optional proxy cache
  • x-ditto-cache-key: cache key for the optional proxy cache (when enabled and cacheable)
  • x-ditto-cache-source: memory or redis when x-ditto-cache=hit
  • x-ditto-shim: present when POST /v1/responses is shimmed via POST /v1/chat/completions
  • x-ditto-translation: present when a translation backend handled the request

Stream Collection

If you want to consume a streaming response but still produce a final unified GenerateResponse, use collect_stream:

use ditto_core::contracts::GenerateRequest;
use ditto_core::llm_core::model::LanguageModel;
use ditto_core::llm_core::stream::collect_stream;

let stream = llm.stream(GenerateRequest::from(messages)).await?;
let collected = collect_stream(stream).await?;
println!("{}", collected.response.text());

Text (generateText / streamText)

Single-step text helpers (no tool execution loop):

use ditto_core::capabilities::text::LanguageModelTextExt;
use ditto_core::contracts::GenerateRequest;

let out = llm.generate_text(GenerateRequest::from(messages)).await?;
println!("{}", out.text);

Streaming:

use futures_util::StreamExt;
use ditto_core::capabilities::text::LanguageModelTextExt;
use ditto_core::contracts::GenerateRequest;

let (handle, mut text_stream) = llm
    .stream_text(GenerateRequest::from(messages))
    .await?
    .into_text_stream();
while let Some(delta) = text_stream.next().await {
    print!("{}", delta?);
}
let final_text = handle.final_text()?.unwrap();
println!("\nfinal={final_text}");

Structured Output (generateObject / streamObject)

Use LanguageModelObjectExt to request structured output (AI SDK-style generateObject / streamObject).

Defaults (ObjectOptions::default()):

  • strategy = Auto:
    • openai → JSON Schema via response_format (native)
    • other providers (incl. openai-compatible) → tool-call enforced JSON (wraps output under {"value": ...})
    • always falls back to extracting JSON from text if needed
  • output = Object (top-level object)
use ditto_core::capabilities::object::LanguageModelObjectExt;
use ditto_core::contracts::{GenerateRequest, Message};
use ditto_core::provider_options::JsonSchemaFormat;
use serde_json::json;

let schema = JsonSchemaFormat {
    name: "recipe".to_string(),
    schema: json!({ "type": "object" }),
    strict: None,
};

let out = llm
    .generate_object_json(GenerateRequest::from(vec![Message::user("hi")]), schema)
    .await?;

println!("{}", out.object);

Streaming (partial objects):

use futures_util::StreamExt;

let (handle, mut partial_object_stream) = llm
    .stream_object(GenerateRequest::from(messages), schema)
    .await?
    .into_partial_stream();
while let Some(partial) = partial_object_stream.next().await {
    println!("{:?}", partial?);
}
let final_obj = handle.final_json()?.unwrap();
println!("{final_obj}");

Streaming arrays (AI SDK elementStream):

use ditto_core::capabilities::object::{ObjectOptions, ObjectOutput};
use futures_util::StreamExt;

let mut result = llm
    .stream_object_with(
        GenerateRequest::from(messages),
        schema, // schema for a single element; ditto wraps it as {type:"array", items: ...}
        ObjectOptions {
            output: ObjectOutput::Array,
            ..ObjectOptions::default()
        },
    )
    .await?;

while let Some(element) = result.element_stream.next().await {
    println!("element = {}", element?);
}

Streaming Cancellation

If you need an explicit abort handle (instead of relying on drop semantics), wrap the stream:

use ditto_core::contracts::GenerateRequest;
use ditto_core::llm_core::model::LanguageModel;
use ditto_core::llm_core::stream::abortable_stream;

let stream = llm.stream(GenerateRequest::from(messages)).await?;
let abortable = abortable_stream(stream);
abortable.handle.abort();

Embeddings

EmbeddingModelExt provides AI SDK-style aliases:

use ditto_core::capabilities::EmbeddingModelExt;

let vectors = embeddings.embed_many(vec!["hello".to_string(), "world".to_string()]).await?;
let one = embeddings.embed_one("hi".to_string()).await?;

Custom HTTP Client

Providers accept a custom reqwest::Client so you can configure timeouts, proxies, and default headers (e.g. enterprise gateways):

let http = reqwest::Client::builder().build()?;
let llm = ditto_core::providers::OpenAI::new(api_key).with_http_client(http);

When building providers from config, you can also set per-node default headers via ProviderConfig.http_headers.

Provider Auth (Custom Headers / Query Params)

Providers apply their standard auth headers by default (OpenAI/OpenAI-compatible: bearer token; Anthropic: x-api-key; Google: x-goog-api-key).

If you need a non-standard auth header (e.g. Azure / enterprise gateways), use:

auth = { type = "http_header_env", header = "api-key", keys = ["AZURE_OPENAI_API_KEY"] }

If your gateway expects auth in a query param (e.g. ...?api_key=...), use:

auth = { type = "query_param_env", param = "api_key", keys = ["GATEWAY_API_KEY"] }

If you need to fetch a token dynamically (e.g. gcloud auth print-access-token, aws-vault, Vault CLI), use:

auth = { type = "command", command = ["gcloud", "auth", "print-access-token"] }

The command stdout may be a plain token, a JSON string ("sk-..."), or a JSON object with api_key/token/access_token. Ditto enforces a 15s timeout (configurable via DITTO_AUTH_COMMAND_TIMEOUT_MS/SECS) and a 64KiB stdout/stderr cap.

Provider Node Query Params (Optional)

If your provider requires additional fixed query params on every request (e.g. Azure OpenAI api-version), set ProviderConfig.http_query_params:

base_url = "https://{resource}.openai.azure.com/openai/deployments/{deployment}"
http_query_params = { "api-version" = "2024-02-01" }
auth = { type = "http_header_env", header = "api-key", keys = ["AZURE_OPENAI_API_KEY"] }

Provider Options (Per Provider)

Requests that support provider_options accept either:

  • Legacy (flat): a single JSON object applied to the current provider.
  • Bucketed: a JSON object keyed by provider id (optionally with a "*" default bucket).

Bucketed example:

{
  "provider_options": {
    "*": { "parallel_tool_calls": false },
    "openai": { "reasoning_effort": "high" },
    "openai-compatible": { "response_format": { "type": "json_schema", "json_schema": { "name": "answer", "schema": { "type": "object" } } } }
  }
}

Precedence is "*" (base) → provider bucket (override). Provider ids are: openai, openai-compatible (also accepts openai_compatible as an alias key), anthropic, google, cohere, bedrock, vertex.

File Upload (Optional)

If you want to send PDFs via file_id (instead of inlining base64), OpenAI and OpenAI-compatible providers expose a small upload helper:

let file_id = llm.upload_file("doc.pdf", pdf_bytes).await?;

Development

Enable repo-local git hooks:

git config core.hooksPath githooks

This enforces Conventional Commits and requires each commit to include CHANGELOG.md.

Structure Gates

默认结构 gate 以 Rust 主线为准,目标是让“默认 core + all-features + no-default-features + provider feature matrix”都持续可构建、可 lint。对应的本地最小命令集:

cargo fmt --all -- --check
cargo run -p ditto-core --bin ditto-llms-txt -- --check
cargo check --workspace
cargo test --workspace --all-targets
cargo check -p ditto-core --examples
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace --all-targets --all-features
cargo check -p ditto-core --no-default-features
cargo clippy -p ditto-core --no-default-features -- -D warnings
cargo check -p ditto-server --no-default-features
cargo clippy -p ditto-server --no-default-features -- -D warnings

Node 默认只验证 packages/*

pnpm run typecheck
pnpm run build

可选 Admin UI 资产单独验证:

pnpm run typecheck:admin-ui
pnpm run build:admin-ui

Integration Tests (Optional)

Enable the integration feature and set real credentials:

  • OpenAI Responses: OPENAI_API_KEY + OPENAI_MODEL
  • OpenAI-compatible: OPENAI_COMPAT_BASE_URL + OPENAI_COMPAT_MODEL (+ OPENAI_COMPAT_API_KEY optional)

Then run:

cargo test --all-features

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors