diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..d1b98c3 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,101 @@ +# AGENTS.md + +Guidance for agentic coding tools working in this repository. +Scope: entire repo. + +`CLAUDE.md` is a symlink to this file. Always edit `AGENTS.md` directly; never modify `CLAUDE.md`. + +## Project Snapshot +- Project: `otari` (PyPI package name `otari`), the **Python client SDK** for the otari + gateway / platform. `OtariClient` (sync) and `AsyncOtariClient` (async) talk to a running + gateway over HTTP. +- Language/runtime: Python 3.11+ (CI matrix: 3.11, 3.12, 3.13). +- Package manager + task runner: `uv`. +- Source root: `src/otari` (imported as `otari`). +- Tests: `tests/unit` (mocked, offline) and `tests/integration` (real gateway). + +## Architecture (Big Picture) +This SDK is a **thin, hand-written shell over an OpenAPI-generated typed core**. Read these +together before changing request behavior. + +- **Generated core** (`src/otari/_client/`): produced by OpenAPI Generator from the gateway's + OpenAPI spec. It is **generated, not hand-edited.** Regeneration happens upstream in the + gateway repo (`.github/workflows/gateway-sdk-codegen.yml`), which opens a + `sdk-codegen/client-core` PR here. The core is **excluded from ruff and mypy** + (`pyproject.toml`: `extend-exclude` / mypy `exclude`). Do not edit it to fix a lint error; + fix the shell or the upstream spec/generator instead. +- **Hand-written shell** (everything else under `src/otari/`): + - `client.py` / `async_client.py`: ergonomic `OtariClient` / `AsyncOtariClient` with + `completion`, `response`, `message`, `embedding`, `moderation`, `rerank`, `list_models`, + batch operations, and a `control_plane` accessor. + - `_base.py`: shared logic: auth-mode resolution, default headers, URL normalization, and + the single seam where generated `ApiException` is caught and mapped to typed errors. + - `_streaming.py`: hand-written SSE shim. The generated core buffers and **cannot stream**, + so streaming endpoints use raw `httpx` + a line/event parser. Chat streaming yields typed + `ChatCompletionChunk`; responses/messages streaming yields raw event `dict`s (no chunk + model exists for those). + - `errors.py`: typed error hierarchy (`OtariError` base + subclasses). + - `types.py`: re-exports of generated models plus hand-written TypedDicts (batch/options). + - `control_plane.py`: wrapper over the management endpoints (keys/users/budgets/pricing/usage). + +### Two auth modes (must both keep working) +Resolved in `_base.py` from constructor args, then environment: +- **Platform** (`OTARI_AI_TOKEN` / `platform_token`): `Authorization: Bearer `, base URL + defaults to `https://api.otari.ai`. +- **Self-hosted** (`api_key` + `api_base`, env `GATEWAY_API_KEY` / `GATEWAY_API_BASE`): + `Otari-Key` header; `api_base` is **required** in this mode. +Error mapping applies in both modes; do not regress one when changing the other. + +### Endpoint-coverage drift gate +`tests/unit/test_endpoint_coverage.py` fetches the canonical gateway spec +(`https://raw.githubusercontent.com/mozilla-ai/otari/main/docs/public/openapi.json`) and +asserts every gateway endpoint is accounted for in `sdk-endpoints.txt` (`[covered]` or +`[excluded]` with a reason). A new gateway endpoint with no wrapper and no explicit exclusion +fails CI. When you add or intentionally skip an endpoint, update `sdk-endpoints.txt`. + +## Setup Commands +- Install (dev): `uv sync --extra dev` + +## Test Commands +- Full suite: `uv run pytest` +- Unit only: `uv run pytest tests/unit` +- Single test: `uv run pytest tests/unit/test_client.py::TestOtariClient::test_completion -v` +- Drift gate (needs network): `uv run pytest tests/unit/test_endpoint_coverage.py -v` +- Integration tests under `tests/integration/` spawn / require a real gateway and are skipped + when one is not available. + +## Lint / Typecheck / Build Commands +- Lint: `uv run ruff check .` +- Typecheck (mypy strict): `uv run mypy src/` +- Build: `uv build` + +## Repository Conventions +- `from __future__ import annotations` at the top of modules; `TYPE_CHECKING` for type-only + imports; import `Callable`/`Iterator` from `collections.abc`. +- mypy is `strict`; the generated `otari._client` is excluded. New/changed shell code must be + fully typed. Use `@overload` for streaming polymorphism (`stream=True` vs not), as `client.py` + already does. +- Public API is exported from `src/otari/__init__.py` (clients, errors, types); don't remove or + rename exports without auditing callers. +- Unit tests mock at the transport seam (the generated core's REST client) and use `respx` for + the raw-`httpx` streaming path. Test classes are `Test`. + +## Change Validation Checklist +- Touched request handling, auth, or errors → run `tests/unit` and confirm both auth modes still + map errors correctly. +- Touched streaming → run the streaming tests; verify chat yields `ChatCompletionChunk` and + responses/messages yield raw dicts. +- Added/removed an endpoint wrapper → update `sdk-endpoints.txt` and run the drift gate. +- Always run `uv run ruff check .` and `uv run mypy src/` before opening a PR. + +## Writing style + +- Avoid em dashes and double hyphens (`--`) used as separators in prose + (README, docs, doc comments, commit messages, PR descriptions). Use commas, + semicolons, colons, parentheses, or periods, or rephrase. This does not apply + to code (for example CLI flags like `--all`) or en-dash numeric ranges like `3–4`. + +## Notes for Agents +- Never hand-edit `src/otari/_client/`; it is regenerated from the gateway spec. +- Prefer minimal, targeted edits; match existing typing and import style in touched files. +- Preserve security-relevant behavior (header/auth handling, error-detail boundaries). diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/README.md b/README.md index 873a29f..d529e56 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@
-# otari (Python) +# Otari Python Client SDK ![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg) [![PyPI](https://img.shields.io/pypi/v/otari)](https://pypi.org/project/otari/) @@ -12,15 +12,21 @@ Discord -**Python client for [otari-gateway](https://github.com/mozilla-ai/otari).** -Communicate with any LLM provider through the gateway using a single, typed interface. +**Python client for [otari](https://github.com/mozilla-ai/otari), the open-source core that powers [otari.ai](https://otari.ai).** +Communicate with any LLM provider through otari using a single, typed interface. [TypeScript SDK](https://github.com/mozilla-ai/otari-sdk-ts) | [Documentation](https://mozilla-ai.github.io/otari/) | [Platform (Beta)](https://otari.ai/)
+> New to otari? The [otari repo](https://github.com/mozilla-ai/otari) explains what it is and why you’d use it. + ## Quickstart +```bash +pip install otari +``` + Generate an API token at [otari.ai/organization-settings/api-tokens](https://otari.ai/organization-settings/api-tokens), then add a provider key (e.g. OpenAI) at [otari.ai/organization-settings/provider-keys](https://otari.ai/organization-settings/provider-keys) so the gateway can route requests to that provider. Then use the client: ```python @@ -38,33 +44,14 @@ response = client.completion( print(response.choices[0].message.content) ``` -**That's it!** With no `api_base`, the client defaults to the hosted gateway at `https://api.otari.ai`. Change the model string to switch between LLM providers through the gateway. - -Prefer async? Use `AsyncOtariClient`, which exposes the same API with `await` (see [Async usage](#async-usage)). - -Prefer to keep secrets out of code? Set `OTARI_AI_TOKEN` in your environment and `OtariClient()` picks up the token automatically. - -## Self-hosting the gateway - -Prefer to run the gateway yourself instead of using the hosted otari.ai? Follow the setup in the [otari gateway repo](https://github.com/mozilla-ai/otari), then point the SDK at it: - -```python -client = OtariClient( - api_base="http://localhost:8000", # or wherever you host the gateway - api_key="your-gateway-api-key", -) -``` - -The SDK sends `api_key` via the custom `Otari-Key: Bearer …` header. Env: `GATEWAY_API_BASE` + `GATEWAY_API_KEY`. - -Make sure your gateway has provider keys configured (e.g. OpenAI) so it can route requests upstream — see the [otari gateway repo](https://github.com/mozilla-ai/otari) for setup. +With no `api_base`, the client defaults to the hosted gateway at `https://api.otari.ai`. Change the model string to switch between LLM providers through the gateway. ## Installation ### Requirements - Python 3.11 or newer -- A running [otari-gateway](https://mozilla-ai.github.io/otari/gateway/overview/) instance +- A running [otari](https://mozilla-ai.github.io/otari/gateway/overview/) instance (or the hosted gateway at [otari.ai](https://otari.ai/)) ### Install @@ -72,9 +59,9 @@ Make sure your gateway has provider keys configured (e.g. OpenAI) so it can rout pip install otari ``` -### Setting Up Credentials +### Setting up credentials -For the hosted gateway, set your platform token (no `api_base` needed — it defaults to `https://api.otari.ai`): +For the hosted gateway, set your platform token (no `api_base` needed, it defaults to `https://api.otari.ai`): ```bash export OTARI_AI_TOKEN="tk_your_api_token" @@ -89,76 +76,67 @@ export GATEWAY_API_BASE="http://localhost:8000" export GATEWAY_API_KEY="your-key-here" ``` -Alternatively, pass credentials directly when creating the client (see [Usage](#usage) examples). - -## otari-gateway - -This Python SDK is a client for [otari-gateway](https://github.com/mozilla-ai/otari), an **optional** FastAPI-based proxy server that adds enterprise-grade features on top of the core library: - -- **Budget Management** - Enforce spending limits with automatic daily, weekly, or monthly resets -- **API Key Management** - Issue, revoke, and monitor virtual API keys without exposing provider credentials -- **Usage Analytics** - Track every request with full token counts, costs, and metadata -- **Multi-tenant Support** - Manage access and budgets across users and teams - -The gateway sits between your applications and LLM providers, exposing an OpenAI-compatible API that works with any supported provider. - -### Quick Start - -```bash -docker run \ - -e GATEWAY_MASTER_KEY="your-secure-master-key" \ - -e OPENAI_API_KEY="your-api-key" \ - -p 8000:8000 \ - ghcr.io/mozilla-ai/otari/gateway:latest -``` - -> **Note:** You can use a specific release version instead of `latest` (e.g., `1.2.0`). See [available versions](https://github.com/orgs/mozilla-ai/packages/container/package/otari%2Fgateway). - -### Managed Platform (Beta) +Alternatively, pass credentials directly when creating the client (see [Authentication](#authentication)). -Prefer a hosted experience? The [otari platform](https://otari.ai/) provides a managed control plane for keys, usage tracking, and cost visibility across providers, while still building on the same `otari` interfaces. +## Authentication -## Usage - -> **Migrating from a previous version?** `OtariClient` is now **synchronous** — call its methods directly (no `await`). For asynchronous code, switch to `AsyncOtariClient`, which keeps the previous `await`-based API. See [Async usage](#async-usage). - -### Authentication Modes - -The client supports two authentication modes, matching the TypeScript SDK: +The client supports two authentication modes, matching the TypeScript SDK. When no explicit credentials are passed, the client auto-detects the mode from environment variables. -#### Platform Mode (Recommended) +**Platform mode (hosted)** -Uses a Bearer token in the standard Authorization header. On the hosted platform, generate an API token at [otari.ai/organization-settings/api-tokens](https://otari.ai/organization-settings/api-tokens) and add a provider key (e.g. OpenAI) at [otari.ai/organization-settings/provider-keys](https://otari.ai/organization-settings/provider-keys) so the gateway can route requests to that provider. With no `api_base`, the client defaults to the hosted gateway at `https://api.otari.ai`: +Targets the hosted platform at [otari.ai](https://otari.ai/). The platform token is sent as a Bearer token in the standard `Authorization` header. Generate an API token at [otari.ai/organization-settings/api-tokens](https://otari.ai/organization-settings/api-tokens) and add a provider key (e.g. OpenAI) at [otari.ai/organization-settings/provider-keys](https://otari.ai/organization-settings/provider-keys) so the gateway can route requests to that provider. With no `api_base`, the client defaults to the hosted gateway at `https://api.otari.ai`: ```python +from otari import OtariClient + client = OtariClient( platform_token="tk_your_api_token", ) ``` -#### Non-Platform Mode (Self-Hosted) +Set `OTARI_AI_TOKEN` (or the legacy alias `GATEWAY_PLATFORM_TOKEN`) and `OtariClient()` picks up the token automatically. + +**Self-hosted mode** -Sends the API key via a custom `Otari-Key` header. This targets a self-hosted gateway, so an explicit `api_base` is required: +Targets a gateway you run yourself. The API key is sent via the custom `Otari-Key` header, and an explicit `api_base` is required. Follow the setup in the [otari repo](https://github.com/mozilla-ai/otari), then point the SDK at your gateway: ```python +from otari import OtariClient + client = OtariClient( - api_base="http://localhost:8000", - api_key="your-api-key", + api_base="http://localhost:8000", # or wherever you host the gateway + api_key="your-gateway-api-key", ) ``` -#### Auto-Detection from Environment Variables +Set `GATEWAY_API_BASE` and `GATEWAY_API_KEY` and `OtariClient()` picks them up automatically. Make sure your gateway has provider keys configured (e.g. OpenAI) so it can route requests upstream; see the [otari repo](https://github.com/mozilla-ai/otari) for setup. -When no explicit credentials are provided, the client reads from environment variables: +**Environment variable quick reference** + +| Variable | Mode | Purpose | +|----------|------|---------| +| `OTARI_AI_TOKEN` | Platform | Platform token, sent as `Authorization: Bearer …`. | +| `GATEWAY_PLATFORM_TOKEN` | Platform | Legacy alias for `OTARI_AI_TOKEN` (lower precedence). | +| `GATEWAY_API_BASE` | Self-hosted | Base URL of the gateway (required in self-hosted mode). | +| `GATEWAY_API_KEY` | Self-hosted | API key, sent via the `Otari-Key` header. | +| `GATEWAY_ADMIN_KEY` | Either | Admin/master key for the control-plane endpoints. | + +When no explicit credentials are provided, the client reads from these variables: ```python +from otari import OtariClient + # Platform mode: OTARI_AI_TOKEN (or legacy GATEWAY_PLATFORM_TOKEN), # defaulting to the hosted gateway. # Self-hosted: GATEWAY_API_BASE + GATEWAY_API_KEY. client = OtariClient() ``` -### Chat Completions +## Usage + +> **Migrating from a previous version?** `OtariClient` is now synchronous, call its methods directly (no `await`). For asynchronous code, switch to `AsyncOtariClient`, which keeps the previous `await`-based API. See [Async usage](#async-usage). + +### Chat completions ```python response = client.completion( @@ -195,10 +173,9 @@ response = client.response( print(response.output_text) ``` -### Messages API (Anthropic-shaped) +### Messages API -The gateway's `/messages` endpoint (Anthropic message shape) is exposed via -`message(...)`. Set `stream=True` to iterate raw message-stream event dicts. +The gateway's `/messages` endpoint (Anthropic message shape) is exposed via `message(...)`. `max_tokens` is required. Set `stream=True` to iterate raw message-stream event dicts. ```python message = client.message( @@ -221,7 +198,7 @@ result = client.embedding( print(result.data[0].embedding) ``` -### Listing Models +### Listing models ```python models = client.list_models() @@ -229,39 +206,67 @@ for model in models: print(model.id) ``` -### Async usage +### Moderation -Every method on `OtariClient` has an asynchronous counterpart on `AsyncOtariClient`. It accepts the same constructor arguments and exposes the same methods, but they are coroutines you `await` (and streams are async iterables): +```python +result = client.moderation( + model="openai:omni-moderation-latest", + input="Some text to classify.", +) + +print(result.results[0].flagged) +``` + +### Reranking ```python -import asyncio +result = client.rerank( + model="cohere:rerank-v3.5", + query="What is the capital of France?", + documents=["Paris is the capital of France.", "Berlin is in Germany."], +) -from otari import AsyncOtariClient +for item in result.results: + print(item.index, item.relevance_score) +``` +### Batch operations -async def main() -> None: - async with AsyncOtariClient(platform_token="tk_your_api_token") as client: - response = await client.completion( - model="openai:gpt-4o-mini", - messages=[{"role": "user", "content": "Hello!"}], - ) - print(response.choices[0].message.content) +Submit many requests as a single batch job, poll for status, then fetch results once the batch completes. Batch endpoints are scoped to a `provider`. - stream = await client.completion( - model="openai:gpt-4o-mini", - messages=[{"role": "user", "content": "Tell me a story."}], - stream=True, - ) - async for chunk in stream: - content = chunk.choices[0].delta.content - if content: - print(content, end="", flush=True) +```python +batch = client.create_batch( + { + "model": "openai:gpt-4o-mini", + "requests": [ + { + "custom_id": "req-1", + "body": { + "model": "openai:gpt-4o-mini", + "messages": [{"role": "user", "content": "Hello!"}], + }, + }, + ], + "completion_window": "24h", + } +) +# Poll for status. +status = client.retrieve_batch(batch.id, provider="openai") -asyncio.run(main()) +# List batches for a provider. +batches = client.list_batches("openai", {"limit": 20}) + +# Fetch results once complete (raises BatchNotCompleteError on HTTP 409). +results = client.retrieve_batch_results(batch.id, provider="openai") +for item in results.results: + print(item.custom_id, item.result) + +# Cancel a running batch. +client.cancel_batch(batch.id, provider="openai") ``` -### Error Handling +### Error handling In platform mode, HTTP errors are mapped to typed exceptions: @@ -285,13 +290,46 @@ except RateLimitError as e: | 401, 403 | `AuthenticationError` | Invalid or missing credentials | | 402 | `InsufficientFundsError` | Budget or credits exhausted | | 404 | `ModelNotFoundError` | Model not found, or no provider key configured for the requested provider. The exception's `message` carries the gateway's detail. | +| 409 | `BatchNotCompleteError` | Batch results requested before the batch finished | | 429 | `RateLimitError` | Rate limit exceeded (includes `retry_after`) | | 502 | `UpstreamProviderError` | Upstream provider unreachable | | 504 | `GatewayTimeoutError` | Gateway timed out waiting for provider | `UnsupportedCapabilityError` surfaces in both platform and non-platform modes; the other mappings are platform-mode only. -### Context Manager +### Async usage + +Every method on `OtariClient` has an asynchronous counterpart on `AsyncOtariClient`. It accepts the same constructor arguments and exposes the same methods, but they are coroutines you `await` (and streams are async iterables): + +```python +import asyncio + +from otari import AsyncOtariClient + + +async def main() -> None: + async with AsyncOtariClient(platform_token="tk_your_api_token") as client: + response = await client.completion( + model="openai:gpt-4o-mini", + messages=[{"role": "user", "content": "Hello!"}], + ) + print(response.choices[0].message.content) + + stream = await client.completion( + model="openai:gpt-4o-mini", + messages=[{"role": "user", "content": "Tell me a story."}], + stream=True, + ) + async for chunk in stream: + content = chunk.choices[0].delta.content + if content: + print(content, end="", flush=True) + + +asyncio.run(main()) +``` + +### Context manager The client supports a context manager for automatic cleanup: @@ -313,15 +351,6 @@ async with AsyncOtariClient(api_base="http://localhost:8000") as client: ) ``` -## Why choose `otari`? - -- **Simple, unified interface** - Single client for all providers through the gateway, switch models with just a string change -- **Developer friendly** - Full type hints for better IDE support and clear, actionable error messages -- **Leverages the OpenAI SDK** - Built on the official OpenAI Python SDK for maximum compatibility -- **Sync and async** - Use the synchronous `OtariClient` or the asynchronous `AsyncOtariClient`, both with the same typed interface -- **Stays framework-agnostic** so it can be used across different projects and use cases -- **Battle-tested** - Powers our own production tools ([any-agent](https://github.com/mozilla-ai/any-agent)) - ## Development ```bash diff --git a/src/otari/_base.py b/src/otari/_base.py index 86e7b58..0a56a62 100644 --- a/src/otari/_base.py +++ b/src/otari/_base.py @@ -7,7 +7,7 @@ :class:`~otari.async_client.AsyncOtariClient`) construct their own generated ``_client`` and httpx clients and implement the I/O methods on top of this base. -Option C: the inference path is a thin shell over the OpenAPI-generated core in +The inference path is a thin shell over the OpenAPI-generated core in :mod:`otari._client` (typed models + per-endpoint API classes). The generated ``ApiException`` is the single error type all generated calls raise; this module maps it to the typed otari exception hierarchy in :mod:`otari.errors`. diff --git a/src/otari/async_client.py b/src/otari/async_client.py index 04dd821..6ac0874 100644 --- a/src/otari/async_client.py +++ b/src/otari/async_client.py @@ -1,6 +1,6 @@ """AsyncOtariClient: asynchronous Python client for the otari gateway. -Option C: a thin async shell over the OpenAPI-generated core in +A thin async shell over the OpenAPI-generated core in :mod:`otari._client`. The generated core is synchronous (urllib3-based), so non-streaming calls are dispatched to a worker thread via ``asyncio.to_thread``; streaming is natively async over ``httpx.AsyncClient`` and the SSE shim in diff --git a/src/otari/client.py b/src/otari/client.py index d64ed1b..5dc31db 100644 --- a/src/otari/client.py +++ b/src/otari/client.py @@ -1,6 +1,6 @@ """OtariClient: synchronous Python client for the otari gateway. -Option C: a thin, ergonomic shell over the OpenAPI-generated core in +A thin, ergonomic shell over the OpenAPI-generated core in :mod:`otari._client`. Non-streaming calls go through the generated typed API classes (returning typed models such as ``ChatCompletion``); streaming calls go through the hand-written SSE shim in :mod:`otari._streaming`; generated diff --git a/src/otari/control_plane.py b/src/otari/control_plane.py index 27a37ea..54e0d4c 100644 --- a/src/otari/control_plane.py +++ b/src/otari/control_plane.py @@ -1,7 +1,7 @@ """Typed client for the gateway control-plane (management) endpoints. Wraps the OpenAPI-generated :mod:`otari._client` core (the same core that backs -the inference path under Option C). The control-plane endpoints (API keys, +the inference path). The control-plane endpoints (API keys, users, budgets, pricing, usage) authenticate with ``Authorization: Bearer ``, which is distinct from the ``Otari-Key`` virtual key used for inference. Obtain an instance via diff --git a/tests/unit/conftest.py b/tests/unit/conftest.py index c3f028a..2e7f4b0 100644 --- a/tests/unit/conftest.py +++ b/tests/unit/conftest.py @@ -1,6 +1,6 @@ """Shared test helpers for mocking the generated core's transport. -Option C wires the SDK over the OpenAPI-generated core (:mod:`otari._client`), +The SDK is a thin shell over the OpenAPI-generated core (:mod:`otari._client`), whose non-streaming calls go through ``RESTClientObject.request`` (urllib3) and whose streaming path is a hand-written raw httpx request. These helpers mock both seams without a live gateway: diff --git a/tests/unit/test_async_client.py b/tests/unit/test_async_client.py index 794e4ee..cf5bd96 100644 --- a/tests/unit/test_async_client.py +++ b/tests/unit/test_async_client.py @@ -1,4 +1,4 @@ -"""Tests for the asynchronous AsyncOtariClient (Option C: generated-core shell). +"""Tests for the asynchronous AsyncOtariClient (generated-core shell). The async client dispatches the (synchronous) generated calls off-thread via ``asyncio.to_thread`` and streams natively over ``httpx.AsyncClient``. Non- diff --git a/tests/unit/test_client.py b/tests/unit/test_client.py index dc8d4c8..e1a839e 100644 --- a/tests/unit/test_client.py +++ b/tests/unit/test_client.py @@ -1,4 +1,4 @@ -"""Tests for the synchronous OtariClient (Option C: generated-core shell). +"""Tests for the synchronous OtariClient (generated-core shell). Covers constructor / auth-mode wiring, request shaping, typed response parsing, generated ``ApiException`` -> typed error mapping, and the hand-written SSE