diff --git a/docs/native-attachments-design.md b/docs/native-attachments-design.md new file mode 100644 index 0000000..c1a637c --- /dev/null +++ b/docs/native-attachments-design.md @@ -0,0 +1,231 @@ +# Design: Native LLM Attachments over the Private (OHTTP) Path + +## Status + +Proposal. Spans three repos: `chat-app` (browser), `chat-api` (relay), `tee-gateway` +(enclave). The bulk of the change lands in `tee-gateway`. + +## Motivation + +Today attachments are handled by **server-side parsing in `chat-api`**: + +- `chat-api/src/core/attachments.py` downloads each attachment and runs PyMuPDF / + python-docx to extract **plain text**, then injects that text into the prompt. +- Images are classified by content-type and passed through as URLs. + +This is the wrong layer to solve the problem: + +1. **It throws away everything the models do natively.** Modern Claude / GPT / + Gemini ingest PDFs and images directly — layout, tables, figures, charts, + handwriting, embedded images. Flattening a PDF to `page.get_text()` loses all + of that and feeds the model a worse input than it could handle itself. +2. **It only works on the non-private path.** The parsing in `attachments.py` is + invoked exclusively from the regular `POST /api/v1/chat` handler. On the + **OHTTP path**, `chat-api` is a dumb relay — it forwards opaque ciphertext to + the enclave and never sees the body — so attachments are simply not processed. + Worse, in the enclave `llm_backend.convert_messages` flattens multimodal + content parts to text only (`"".join(part.get("text", "") ...)`), so any + `image_url` part is **silently dropped** before it reaches the provider. + +Net result: **attachments and privacy are currently mutually exclusive.** +Attachments only work on the route where `chat-api` reads the plaintext, and the +private route drops them. + +## Goal + +Send attachments to the model **natively**, on the **private (OHTTP) path**: + +- No server-side text extraction. The file bytes reach the model as a native + image/document content part. +- `chat-api` and Cloudflare never see attachment plaintext (same trust boundary + as the message text already enjoys on OHTTP). +- The enclave converts the inner request's multimodal content into each + provider's native format via LangChain. + +## Trust boundary (what this does and does not hide) + +- **Hidden from:** the browser→relay transport, `chat-api`, the OHTTP relay, + Cloudflare/R2. They see only HPKE ciphertext. +- **Visible to:** the enclave (it decrypts — that's the trust anchor) and the + **upstream LLM provider** (OpenAI/Anthropic/Google/xAI/ByteDance), which + receives the attachment as part of the completion request. This is identical + to how message *text* is already handled: whatever you send the model, the + model provider sees. Fully provider-blind attachments would require the model + to run inside the TEE and are out of scope here. + +## Transport: how the attachment reaches the enclave + +### Phase 1 — inline base64 (recommended starting point) + +The browser embeds the file directly in the message content as a standard +OpenAI-style content part, inside the HPKE-encrypted OHTTP payload: + +```jsonc +{ + "model": "claude-sonnet-4-6", + "messages": [ + { + "role": "user", + "content": [ + { "type": "text", "text": "Summarize this contract." }, + { "type": "image_url", + "image_url": { "url": "data:image/png;base64,iVBORw0K..." } }, + { "type": "file", + "file": { "filename": "contract.pdf", + "file_data": "data:application/pdf;base64,JVBERi0..." } } + ] + } + ] +} +``` + +- Pros: nothing outside the enclave/provider ever sees the bytes; no R2 round + trip; no presigned-URL machinery; no SSRF surface. +- Cons: base64 inflates ~33%; bounded by request/OHTTP size limits; no + persistence (re-sent each turn). Fine for the common case (a few MB of PDF or + an image). Enforce a hard per-request attachment-bytes cap in the enclave. + +### Phase 2 — encrypted blob in R2 (only if large files / persistence needed) + +Browser client-side-encrypts the file (AES-GCM), uploads **ciphertext** to R2 +(Cloudflare sees only ciphertext), and includes inside the OHTTP payload an R2 +reference plus the AES key **wrapped to the TEE attestation/HPKE public key**. +The enclave fetches the ciphertext and decrypts internally. Defer until Phase 1 +limits become a real constraint. + +> Note: do **not** go back to plaintext-in-R2 + presigned URLs. That reintroduces +> the public-bearer-token leak and the SSRF surface in `attachments.py`. + +## Enclave changes (`tee-gateway`) — the core of the work + +### 1. `convert_messages` must preserve multimodal content + +`llm_backend.py:248-255` currently does: + +```python +elif role == "user": + if isinstance(content, list): + content = "".join( + part.get("text", "") if isinstance(part, dict) else str(part) + for part in content + ) + langchain_messages.append(HumanMessage(content=content)) +``` + +Replace the flattening with a converter that maps the inbound OpenAI-style +content parts to **LangChain v1 standard content blocks** (`langchain_core. +messages.content` — `ImageContentBlock`, `FileContentBlock`). Building the +*standard* blocks (rather than raw OpenAI `image_url`/`file` dicts) is important: +each provider package translates them into its own native API, so one code path +covers Anthropic, OpenAI, Gemini, and xAI uniformly. + +- `text` → `{"type": "text", "text": ...}` +- image (base64 data URI or https) → + `{"type": "image", "base64": ..., "mime_type": "image/png"}` (or `"url": ...`) +- document/PDF (base64) → + `{"type": "file", "base64": ..., "mime_type": "application/pdf", + "filename": ""}` + +Keep a `HumanMessage` with a **list** content when parts are present; only +collapse to a plain string when the message is text-only (preserves current +behavior for the no-attachment case). + +**Verified** against the pinned versions (see "Dependency check" below): a +`HumanMessage` carrying these standard blocks converts correctly outbound — +Anthropic emits `{"type":"document","source":{"type":"base64","media_type": +"application/pdf",...}}`, OpenAI emits `{"type":"file","file":{"file_data": +"data:application/pdf;base64,...","filename":...}}`. **Carry the original +`filename`** on file blocks — OpenAI requires one and otherwise substitutes a +placeholder (`LC_AUTOGENERATED`). + +### 2. No new dependencies (PCR constraint) — confirmed + +Native handoff means the enclave does **not** parse PDFs/DOCX itself — it passes +the bytes to the provider. So we should **not** add PyMuPDF/python-docx to +`tee-gateway`. + +**Dependency check (done).** The currently pinned versions already support +standard image *and* file (PDF) content blocks with base64, across every +provider we route to — so **this change needs no dependency bump and the PCR +measurements stay stable**: + +| Package | Pinned | Native file/image support | +|---|---|---| +| `langchain-core` | 1.2.26 | Defines `ImageContentBlock` / `FileContentBlock` (base64, url, file_id, mime_type) | +| `langchain-anthropic` | 1.4.0 | `file` → `document` (defaults `application/pdf`); image → base64 source | +| `langchain-openai` | 1.1.12 | `file` → `file_data` data-URI / `input_file`; image → `image_url` | +| `langchain-google-genai` | 4.2.1 | document/image blocks supported | +| `langchain-xai` | 1.2.2 | subclass of `BaseChatOpenAI` → inherits OpenAI handling | + +This was verified functionally (not just by reading types) by running the +Anthropic and OpenAI outbound message converters over a multimodal +`HumanMessage`. Per-model *acceptance* of PDFs still depends on the model itself +(see capability gating below). + +### 3. Per-provider capability gating + +Not every model accepts every modality. Extend `model_registry` with capability +flags (e.g. `supports_image`, `supports_pdf`) and reject (clear 4xx inside the +inner request) when a request sends a modality the target model can't handle, +rather than silently dropping it as today. + +### 4. Request signing / hashing + +`chat_controller.py` (~645-651) hashes user content via `str(msg.content)`. With +multimodal content that would hash megabytes of base64 and is not canonical. +Define a stable hashing rule, e.g. hash each attachment as +`sha256(mime_type || raw_bytes)` and include those digests (not the base64) in +the canonical request JSON that feeds `keccak256(requestHash ...)`. This keeps +signatures meaningful and bounded while still committing to the exact attachment +content. + +### 5. Limits & validation + +- Hard cap on total attachment bytes per request (post-decode). +- Allowlist of accepted mime types per modality. +- Reject `image_url` values that are remote `https` URLs on the private path if + we want to guarantee the enclave makes no outbound fetch for user content + (Phase 1 = base64 only). Decide explicitly. + +## `chat-api` changes + +- OHTTP path: **no change needed** to the relay itself — attachments ride inside + the encrypted payload it already forwards opaquely. +- Regular `POST /api/v1/chat` path: stop calling `load_documents` / + `is_image_url` and stop injecting extracted text. Either (a) build native + content parts here too, or (b) deprecate attachment support on the non-private + path and route all attachments through OHTTP. Recommend (b) for a single code + path. +- The presigned-URL / `attachments: string[]` machinery and `attachments.py` + become dead code for inference and can be removed once Phase 1 ships (R2 may + still be used for chat-history storage — that is a separate concern and should + be client-side-encrypted if kept). + +## `chat-app` changes + +- Replace "upload to R2 → store presigned URL → send URL in `attachments`" with: + read the file in the browser, base64-encode, and add a native `image_url` / + `file` content part to the outgoing (to-be-encrypted) message. +- Enforce client-side size/type limits matching the enclave caps; surface a clear + error when a file exceeds them. +- Drop the presigned-upload/download hooks from the send path. + +## Rollout + +1. Enclave: `convert_messages` multimodal support + capability flags + hashing + + limits (behind the existing OHTTP path). Ship and verify PCRs. +2. `chat-app`: send native base64 content parts on the OHTTP path. +3. Remove server-side parsing from `chat-api`; retire `attachments.py` and the + presigned-URL attachment flow. +4. (Optional, later) Phase 2 encrypted-R2-blob for large files. + +## Open questions + +- ~~Pinned `langchain-*` versions: do they already support `file` (PDF) content + blocks?~~ **Resolved:** yes, all five providers — no dep bump / PCR change + needed (see Dependency check above). +- Hard size cap value for inline attachments, and the OHTTP request size ceiling. +- Keep or drop attachment support entirely on the non-private path? +- Source of truth for per-model `supports_image` / `supports_pdf` flags — note + `langchain-*` ships `ModelProfile` data (e.g. `langchain_xai/data/_profiles`) + that may already encode some of this. diff --git a/tee_gateway/controllers/chat_controller.py b/tee_gateway/controllers/chat_controller.py index 93f6523..609a157 100644 --- a/tee_gateway/controllers/chat_controller.py +++ b/tee_gateway/controllers/chat_controller.py @@ -1,3 +1,4 @@ +import hashlib import json import time import uuid @@ -29,6 +30,9 @@ get_chat_model_cached, convert_messages, extract_usage, + validate_attachments, + AttachmentValidationError, + _convert_content_part, ) from tee_gateway.pricing import compute_session_cost @@ -47,6 +51,13 @@ def create_chat_completion(body): connexion.request.get_json() ) + # Reject attachments the target model can't handle, and enforce the size cap, + # before doing any provider work. + try: + validate_attachments(chat_request.messages, chat_request.model) + except AttachmentValidationError as e: + return {"error": "Invalid attachment", "message": str(e)}, e.status + if chat_request.stream: return _create_streaming_response(chat_request) else: @@ -636,6 +647,40 @@ def generate(): # --------------------------------------------------------------------------- +def _canonical_user_content(content) -> Any: + """Canonicalize user-message content for request hashing. + + Plain-string content is returned unchanged. For multimodal content (a list of + parts), inline attachment bytes are replaced with a ``sha256`` digest so the + signed request commits to the exact attachment content without bloating the + hashed payload with megabytes of base64. URL / file_id references are kept + verbatim. + """ + if isinstance(content, str): + return content + if not isinstance(content, list): + return str(content) + + canonical = [] + for part in content: + block = _convert_content_part(part) + if block is None: + continue + if block["type"] == "text": + canonical.append({"type": "text", "text": block.get("text", "")}) + continue + entry = {"type": block["type"]} + if "base64" in block: + entry["sha256"] = hashlib.sha256( + block["base64"].encode("utf-8") + ).hexdigest() + for key in ("mime_type", "filename", "url", "file_id"): + if block.get(key): + entry[key] = block[key] + canonical.append(entry) + return canonical + + def _chat_request_to_dict(chat_request: CreateChatCompletionRequest) -> dict: """Serialize a CreateChatCompletionRequest to a canonical dict for hashing.""" messages = [] @@ -646,9 +691,7 @@ def _chat_request_to_dict(chat_request: CreateChatCompletionRequest) -> dict: messages.append( { "role": "user", - "content": msg.content - if isinstance(msg.content, str) - else str(msg.content), + "content": _canonical_user_content(msg.content), } ) elif isinstance(msg, ChatCompletionRequestAssistantMessage): diff --git a/tee_gateway/llm_backend.py b/tee_gateway/llm_backend.py index 22eac59..0b71889 100644 --- a/tee_gateway/llm_backend.py +++ b/tee_gateway/llm_backend.py @@ -8,7 +8,7 @@ import json import logging -from typing import List, Dict, Optional, Any +from typing import List, Dict, Optional, Any, Generator from functools import lru_cache import httpx @@ -50,6 +50,11 @@ # BytePlus ModelArk OpenAI-compatible endpoint (ap-southeast) BYTEDANCE_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/v3" +# Hard cap on total inline (base64) attachment bytes per request, enforced +# regardless of model. Inline base64 rides inside the encrypted payload, so this +# bounds the request size the enclave will accept. +MAX_ATTACHMENT_BYTES = 30 * 1024 * 1024 # 30 MB + # Shared synchronous HTTP clients for each provider. # Initialized to None; built by set_provider_config() after key injection. openai_http_client: Optional[httpx.Client] = None @@ -223,6 +228,207 @@ def get_chat_model_cached(model: str, temperature: float, max_tokens: int): raise ValueError(f"Unsupported provider: {provider}") +def _parse_data_uri(uri: str) -> Optional[tuple[str, str]]: + """Parse a ``data:;base64,`` URI into ``(mime_type, base64_data)``. + + Returns ``None`` if the string is not a base64 data URI. + """ + if not isinstance(uri, str) or not uri.startswith("data:"): + return None + try: + header, data = uri.split(",", 1) + except ValueError: + return None + if ";base64" not in header: + return None + mime_type = header[len("data:") :].split(";", 1)[0] + return mime_type, data + + +def _convert_content_part(part: Any) -> Optional[Dict[str, Any]]: + """Convert one OpenAI-format content part into a LangChain v1 standard content + block (``text`` / ``image`` / ``file``). + + The standard blocks (``langchain_core.messages.content``) are translated into + each provider's native API by the respective ``langchain-`` package, + so a single representation works for Anthropic, OpenAI, Gemini and xAI. Returns + ``None`` for empty or unrecognized parts. + """ + if not isinstance(part, dict): + text = str(part) + return {"type": "text", "text": text} if text else None + + ptype = part.get("type") + + if ptype == "text": + text = part.get("text", "") or "" + return {"type": "text", "text": text} if text else None + + if ptype in ("image_url", "image"): + image_url = part.get("image_url", part) + url = image_url.get("url") if isinstance(image_url, dict) else image_url + if not url: + # Already-standard image block carrying base64 directly. + if part.get("base64"): + block: Dict[str, Any] = {"type": "image", "base64": part["base64"]} + if part.get("mime_type"): + block["mime_type"] = part["mime_type"] + return block + return None + parsed = _parse_data_uri(url) + if parsed: + mime_type, data = parsed + return {"type": "image", "base64": data, "mime_type": mime_type} + return {"type": "image", "url": url} + + if ptype in ("file", "input_file"): + file_obj = part.get("file", part) + if not isinstance(file_obj, dict): + file_obj = {} + file_id = file_obj.get("file_id") or part.get("file_id") + if file_id: + return {"type": "file", "file_id": file_id} + + filename = file_obj.get("filename") or part.get("filename") + file_data = ( + file_obj.get("file_data") or file_obj.get("base64") or part.get("base64") + ) + if file_data: + file_mime: Optional[str] + parsed_file = _parse_data_uri(file_data) + if parsed_file: + file_mime, file_b64 = parsed_file + else: + file_mime = part.get("mime_type") or file_obj.get("mime_type") + file_b64 = file_data + block = {"type": "file", "base64": file_b64} + if file_mime: + block["mime_type"] = file_mime + # OpenAI requires a filename for file uploads; carry it through so + # langchain-openai doesn't substitute a placeholder. + if filename: + block["filename"] = filename + return block + + file_url = file_obj.get("file_url") or file_obj.get("url") or part.get("url") + if file_url: + block = {"type": "file", "url": file_url} + if filename: + block["filename"] = filename + return block + return None + + # Unknown part type: best-effort text extraction. + text = part.get("text", "") or "" + return {"type": "text", "text": text} if text else None + + +def _convert_user_content(content: Any) -> Any: + """Convert user-message content into a value accepted by ``HumanMessage``. + + A list of OpenAI content parts becomes a list of LangChain standard content + blocks. When every part is text, it collapses back to a plain string so simple + requests stay simple (and to preserve prior behavior). Non-list content is + returned unchanged. + """ + if not isinstance(content, list): + return content + + blocks: List[Dict[str, Any]] = [] + for part in content: + block = _convert_content_part(part) + if block is not None: + blocks.append(block) + + if blocks and all(b["type"] == "text" for b in blocks): + return "".join(b["text"] for b in blocks) + + return blocks + + +class AttachmentValidationError(ValueError): + """Raised when a request's attachments violate model capabilities or size + limits. Carries the HTTP status the caller should return.""" + + def __init__(self, message: str, status: int = 400) -> None: + super().__init__(message) + self.status = status + + +def _decoded_base64_len(b64: str) -> int: + """Length in bytes of base64-encoded data without decoding it.""" + data = b64.split(",", 1)[-1] # tolerate a leftover data: prefix + n = len(data) + padding = data[-2:].count("=") if n >= 2 else 0 + return max((n * 3) // 4 - padding, 0) + + +def get_model_capabilities(model: str) -> Dict[str, Any]: + """Return the LangChain capability profile for a model (``image_inputs``, + ``pdf_inputs``, ...), or ``{}`` when the model has no profile data. + + Reads the public ``.profile`` attribute of the instantiated chat model, which + each ``langchain-`` package populates from maintained model data. + """ + try: + chat = get_chat_model_cached(model, 0.0, 16) + return getattr(chat, "profile", None) or {} + except Exception: + return {} + + +def _iter_content_parts(messages: list) -> Generator[Dict[str, Any], None, None]: + for msg in messages: + content = ( + msg.get("content") + if isinstance(msg, dict) + else getattr(msg, "content", None) + ) + if isinstance(content, list): + for part in content: + if isinstance(part, dict): + yield part + + +def validate_attachments(messages: list, model: str) -> None: + """Enforce per-model modality support and the inline attachment size cap. + + Modality gating fails *open*: a modality is only rejected when the model's + profile explicitly marks it unsupported, so models without profile data are + never wrongly blocked (the provider would still reject a truly unsupported + combination). The size cap is a hard limit. Raises ``AttachmentValidationError``. + """ + caps = get_model_capabilities(model) + image_supported = caps.get("image_inputs") + pdf_supported = caps.get("pdf_inputs") + + total_bytes = 0 + for part in _iter_content_parts(messages): + block = _convert_content_part(part) + if block is None: + continue + if block["type"] == "image": + if image_supported is False: + raise AttachmentValidationError( + f"Model {model!r} does not support image attachments." + ) + if "base64" in block: + total_bytes += _decoded_base64_len(block["base64"]) + elif block["type"] == "file": + if pdf_supported is False: + raise AttachmentValidationError( + f"Model {model!r} does not support document attachments." + ) + if "base64" in block: + total_bytes += _decoded_base64_len(block["base64"]) + + if total_bytes > MAX_ATTACHMENT_BYTES: + raise AttachmentValidationError( + f"Attachments exceed the {MAX_ATTACHMENT_BYTES // (1024 * 1024)} MB limit.", + status=413, + ) + + def convert_messages(messages: list) -> List[Any]: """Convert OpenAI-format message objects or dicts to LangChain message objects.""" langchain_messages: List[BaseMessage] = [] @@ -246,13 +452,11 @@ def convert_messages(messages: list) -> List[Any]: langchain_messages.append(SystemMessage(content=content)) elif role == "user": - # content may be a string or a list of content parts; handle both - if isinstance(content, list): - content = "".join( - part.get("text", "") if isinstance(part, dict) else str(part) - for part in content - ) - langchain_messages.append(HumanMessage(content=content)) + # content may be a string or a list of multimodal content parts + # (text / image / file); convert to native LangChain content blocks. + langchain_messages.append( + HumanMessage(content=_convert_user_content(content)) + ) elif role == "assistant": if tool_calls: diff --git a/tee_gateway/test/test_tee_core.py b/tee_gateway/test/test_tee_core.py index 41aac33..e4795f7 100644 --- a/tee_gateway/test/test_tee_core.py +++ b/tee_gateway/test/test_tee_core.py @@ -11,7 +11,10 @@ """ import base64 +import hashlib +import json import unittest +from unittest import mock from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.asymmetric import padding @@ -19,7 +22,13 @@ from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage from tee_gateway import ohttp -from tee_gateway.llm_backend import convert_messages, extract_usage +from tee_gateway.controllers.chat_controller import _canonical_user_content +from tee_gateway.llm_backend import ( + AttachmentValidationError, + convert_messages, + extract_usage, + validate_attachments, +) from tee_gateway.model_registry import get_model_config, get_rate_card from tee_gateway.tee_manager import ( TEEKeyManager, @@ -568,8 +577,8 @@ def test_multi_turn_order_preserved(self): self.assertIsInstance(result[1], HumanMessage) self.assertIsInstance(result[2], AIMessage) - def test_user_content_as_list_of_parts(self): - """Multimodal content parts should be concatenated into a single string.""" + def test_user_content_text_only_parts_collapse_to_string(self): + """A list of text-only parts collapses back to a plain string.""" result = convert_messages( [ { @@ -584,6 +593,130 @@ def test_user_content_as_list_of_parts(self): self.assertIsInstance(result[0], HumanMessage) self.assertEqual(result[0].content, "Hello world") + def test_user_content_with_base64_image(self): + """An image_url data URI becomes a standard image content block, so the + image survives conversion instead of being dropped.""" + result = convert_messages( + [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this?"}, + { + "type": "image_url", + "image_url": { + "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg==" + }, + }, + ], + } + ] + ) + content = result[0].content + self.assertIsInstance(content, list) + self.assertEqual(content[0], {"type": "text", "text": "What is this?"}) + self.assertEqual( + content[1], + { + "type": "image", + "base64": "iVBORw0KGgoAAAANSUhEUg==", + "mime_type": "image/png", + }, + ) + + def test_user_content_with_base64_pdf(self): + """A file part with a base64 PDF data URI becomes a standard file block, + carrying mime_type and the original filename through to the provider.""" + result = convert_messages( + [ + { + "role": "user", + "content": [ + {"type": "text", "text": "Summarize this."}, + { + "type": "file", + "file": { + "filename": "contract.pdf", + "file_data": "data:application/pdf;base64,JVBERi0xLjQK", + }, + }, + ], + } + ] + ) + content = result[0].content + self.assertIsInstance(content, list) + self.assertEqual( + content[1], + { + "type": "file", + "base64": "JVBERi0xLjQK", + "mime_type": "application/pdf", + "filename": "contract.pdf", + }, + ) + + def test_user_content_image_remote_url(self): + """A non-data-URI image URL is passed through as a url image block.""" + result = convert_messages( + [ + { + "role": "user", + "content": [ + { + "type": "image_url", + "image_url": {"url": "https://example.com/cat.png"}, + }, + ], + } + ] + ) + self.assertEqual( + result[0].content, + [{"type": "image", "url": "https://example.com/cat.png"}], + ) + + def test_multimodal_blocks_convert_for_providers(self): + """The standard blocks produced here must be accepted by the provider + message converters — otherwise multimodal requests fail at send time. + This guards the cross-provider contract without needing network access.""" + from langchain_anthropic.chat_models import _format_messages + from langchain_openai.chat_models.base import _convert_message_to_dict + + msg = convert_messages( + [ + { + "role": "user", + "content": [ + {"type": "text", "text": "Read these."}, + { + "type": "image_url", + "image_url": { + "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg==" + }, + }, + { + "type": "file", + "file": { + "filename": "doc.pdf", + "file_data": "data:application/pdf;base64,JVBERi0xLjQK", + }, + }, + ], + } + ] + )[0] + + # Anthropic: file block -> document with application/pdf media type. + _system, anthropic_msgs = _format_messages([msg]) + anthropic_types = {b["type"] for b in anthropic_msgs[0]["content"]} + self.assertEqual(anthropic_types, {"text", "image", "document"}) + + # OpenAI: file block -> file_data data URI. + openai_msg = _convert_message_to_dict(msg) + openai_types = {b["type"] for b in openai_msg["content"]} + self.assertEqual(openai_types, {"text", "image_url", "file"}) + def test_full_tool_call_conversation(self): """End-to-end multi-turn with tool use: user → assistant (tool call) → tool result.""" msgs = [ @@ -613,6 +746,127 @@ def test_full_tool_call_conversation(self): self.assertEqual(result[2].tool_call_id, "call_xyz") +# --------------------------------------------------------------------------- +# llm_backend.validate_attachments +# --------------------------------------------------------------------------- + + +class TestValidateAttachments(unittest.TestCase): + """Attachment gating must reject modalities a model can't handle and enforce + the size cap, while never blocking a model whose capabilities are unknown.""" + + CAPS = "tee_gateway.llm_backend.get_model_capabilities" + + @staticmethod + def _image_msg(b64): + return [ + { + "role": "user", + "content": [ + { + "type": "image_url", + "image_url": {"url": f"data:image/png;base64,{b64}"}, + } + ], + } + ] + + @staticmethod + def _pdf_msg(b64): + return [ + { + "role": "user", + "content": [ + { + "type": "file", + "file": { + "filename": "a.pdf", + "file_data": f"data:application/pdf;base64,{b64}", + }, + } + ], + } + ] + + def test_plain_text_request_passes(self): + # No model instantiation should be needed for a text-only request. + validate_attachments([{"role": "user", "content": "hi"}], "gpt-5") + + def test_image_blocked_when_model_lacks_support(self): + with mock.patch(self.CAPS, return_value={"image_inputs": False}): + with self.assertRaises(AttachmentValidationError) as cm: + validate_attachments(self._image_msg("aGVsbG8="), "grok-4") + self.assertEqual(cm.exception.status, 400) + + def test_image_allowed_when_model_supports(self): + with mock.patch(self.CAPS, return_value={"image_inputs": True}): + validate_attachments(self._image_msg("aGVsbG8="), "gpt-5") + + def test_fails_open_when_profile_unknown(self): + # Empty profile (no capability data) must not block — provider decides. + with mock.patch(self.CAPS, return_value={}): + validate_attachments(self._image_msg("aGVsbG8="), "seed-2.0-lite") + + def test_pdf_blocked_when_model_lacks_support(self): + with mock.patch( + self.CAPS, return_value={"image_inputs": True, "pdf_inputs": False} + ): + with self.assertRaises(AttachmentValidationError): + validate_attachments(self._pdf_msg("JVBERi0="), "grok-4") + + def test_size_cap_enforced(self): + big = "A" * 1000 # ~750 decoded bytes + with ( + mock.patch(self.CAPS, return_value={"image_inputs": True}), + mock.patch("tee_gateway.llm_backend.MAX_ATTACHMENT_BYTES", 100), + ): + with self.assertRaises(AttachmentValidationError) as cm: + validate_attachments(self._image_msg(big), "gpt-5") + self.assertEqual(cm.exception.status, 413) + + +# --------------------------------------------------------------------------- +# chat_controller._canonical_user_content (request-hashing canonicalization) +# --------------------------------------------------------------------------- + + +class TestCanonicalUserContent(unittest.TestCase): + """The signed request commits to attachments via digest, never inlining the + base64 — otherwise the hash payload bloats and signatures become unwieldy.""" + + def test_string_content_passthrough(self): + self.assertEqual(_canonical_user_content("hello"), "hello") + + def test_attachment_digested_not_inlined(self): + content = [ + {"type": "text", "text": "summarize"}, + { + "type": "file", + "file": { + "filename": "a.pdf", + "file_data": "data:application/pdf;base64,JVBERi0xLjQK", + }, + }, + ] + out = _canonical_user_content(content) + self.assertEqual(out[0], {"type": "text", "text": "summarize"}) + entry = out[1] + self.assertEqual(entry["type"], "file") + self.assertEqual(entry["mime_type"], "application/pdf") + self.assertEqual(entry["filename"], "a.pdf") + self.assertEqual(entry["sha256"], hashlib.sha256(b"JVBERi0xLjQK").hexdigest()) + # The raw base64 must not appear anywhere in the hashed payload. + self.assertNotIn("JVBERi0xLjQK", json.dumps(out)) + + def test_deterministic(self): + content = [ + {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBOR=="}} + ] + self.assertEqual( + _canonical_user_content(content), _canonical_user_content(content) + ) + + # --------------------------------------------------------------------------- # llm_backend.extract_usage # ---------------------------------------------------------------------------