diff --git a/docs/native-attachments-design.md b/docs/native-attachments-design.md
new file mode 100644
index 0000000..c1a637c
--- /dev/null
+++ b/docs/native-attachments-design.md
@@ -0,0 +1,231 @@
+# Design: Native LLM Attachments over the Private (OHTTP) Path
+
+## Status
+
+Proposal. Spans three repos: `chat-app` (browser), `chat-api` (relay), `tee-gateway`
+(enclave). The bulk of the change lands in `tee-gateway`.
+
+## Motivation
+
+Today attachments are handled by **server-side parsing in `chat-api`**:
+
+- `chat-api/src/core/attachments.py` downloads each attachment and runs PyMuPDF /
+  python-docx to extract **plain text**, then injects that text into the prompt.
+- Images are classified by content-type and passed through as URLs.
+
+This is the wrong layer to solve the problem:
+
+1. **It throws away everything the models do natively.** Modern Claude / GPT /
+   Gemini ingest PDFs and images directly — layout, tables, figures, charts,
+   handwriting, embedded images. Flattening a PDF to `page.get_text()` loses all
+   of that and feeds the model a worse input than it could handle itself.
+2. **It only works on the non-private path.** The parsing in `attachments.py` is
+   invoked exclusively from the regular `POST /api/v1/chat` handler. On the
+   **OHTTP path**, `chat-api` is a dumb relay — it forwards opaque ciphertext to
+   the enclave and never sees the body — so attachments are simply not processed.
+   Worse, in the enclave `llm_backend.convert_messages` flattens multimodal
+   content parts to text only (`"".join(part.get("text", "") ...)`), so any
+   `image_url` part is **silently dropped** before it reaches the provider.
+
+Net result: **attachments and privacy are currently mutually exclusive.**
+Attachments only work on the route where `chat-api` reads the plaintext, and the
+private route drops them.
+
+## Goal
+
+Send attachments to the model **natively**, on the **private (OHTTP) path**:
+
+- No server-side text extraction. The file bytes reach the model as a native
+  image/document content part.
+- `chat-api` and Cloudflare never see attachment plaintext (same trust boundary
+  as the message text already enjoys on OHTTP).
+- The enclave converts the inner request's multimodal content into each
+  provider's native format via LangChain.
+
+## Trust boundary (what this does and does not hide)
+
+- **Hidden from:** the browser→relay transport, `chat-api`, the OHTTP relay,
+  Cloudflare/R2. They see only HPKE ciphertext.
+- **Visible to:** the enclave (it decrypts — that's the trust anchor) and the
+  **upstream LLM provider** (OpenAI/Anthropic/Google/xAI/ByteDance), which
+  receives the attachment as part of the completion request. This is identical
+  to how message *text* is already handled: whatever you send the model, the
+  model provider sees. Fully provider-blind attachments would require the model
+  to run inside the TEE and are out of scope here.
+
+## Transport: how the attachment reaches the enclave
+
+### Phase 1 — inline base64 (recommended starting point)
+
+The browser embeds the file directly in the message content as a standard
+OpenAI-style content part, inside the HPKE-encrypted OHTTP payload:
+
+```jsonc
+{
+  "model": "claude-sonnet-4-6",
+  "messages": [
+    {
+      "role": "user",
+      "content": [
+        { "type": "text", "text": "Summarize this contract." },
+        { "type": "image_url",
+          "image_url": { "url": "data:image/png;base64,iVBORw0K..." } },
+        { "type": "file",
+          "file": { "filename": "contract.pdf",
+                    "file_data": "data:application/pdf;base64,JVBERi0..." } }
+      ]
+    }
+  ]
+}
+```
+
+- Pros: nothing outside the enclave/provider ever sees the bytes; no R2 round
+  trip; no presigned-URL machinery; no SSRF surface.
+- Cons: base64 inflates ~33%; bounded by request/OHTTP size limits; no
+  persistence (re-sent each turn). Fine for the common case (a few MB of PDF or
+  an image). Enforce a hard per-request attachment-bytes cap in the enclave.
+
+### Phase 2 — encrypted blob in R2 (only if large files / persistence needed)
+
+Browser client-side-encrypts the file (AES-GCM), uploads **ciphertext** to R2
+(Cloudflare sees only ciphertext), and includes inside the OHTTP payload an R2
+reference plus the AES key **wrapped to the TEE attestation/HPKE public key**.
+The enclave fetches the ciphertext and decrypts internally. Defer until Phase 1
+limits become a real constraint.
+
+> Note: do **not** go back to plaintext-in-R2 + presigned URLs. That reintroduces
+> the public-bearer-token leak and the SSRF surface in `attachments.py`.
+
+## Enclave changes (`tee-gateway`) — the core of the work
+
+### 1. `convert_messages` must preserve multimodal content
+
+`llm_backend.py:248-255` currently does:
+
+```python
+elif role == "user":
+    if isinstance(content, list):
+        content = "".join(
+            part.get("text", "") if isinstance(part, dict) else str(part)
+            for part in content
+        )
+    langchain_messages.append(HumanMessage(content=content))
+```
+
+Replace the flattening with a converter that maps the inbound OpenAI-style
+content parts to **LangChain v1 standard content blocks** (`langchain_core.
+messages.content` — `ImageContentBlock`, `FileContentBlock`). Building the
+*standard* blocks (rather than raw OpenAI `image_url`/`file` dicts) is important:
+each provider package translates them into its own native API, so one code path
+covers Anthropic, OpenAI, Gemini, and xAI uniformly.
+
+- `text` → `{"type": "text", "text": ...}`
+- image (base64 data URI or https) →
+  `{"type": "image", "base64": ..., "mime_type": "image/png"}` (or `"url": ...`)
+- document/PDF (base64) →
+  `{"type": "file", "base64": ..., "mime_type": "application/pdf",
+    "filename": "<original name>"}`
+
+Keep a `HumanMessage` with a **list** content when parts are present; only
+collapse to a plain string when the message is text-only (preserves current
+behavior for the no-attachment case).
+
+**Verified** against the pinned versions (see "Dependency check" below): a
+`HumanMessage` carrying these standard blocks converts correctly outbound —
+Anthropic emits `{"type":"document","source":{"type":"base64","media_type":
+"application/pdf",...}}`, OpenAI emits `{"type":"file","file":{"file_data":
+"data:application/pdf;base64,...","filename":...}}`. **Carry the original
+`filename`** on file blocks — OpenAI requires one and otherwise substitutes a
+placeholder (`LC_AUTOGENERATED`).
+
+### 2. No new dependencies (PCR constraint) — confirmed
+
+Native handoff means the enclave does **not** parse PDFs/DOCX itself — it passes
+the bytes to the provider. So we should **not** add PyMuPDF/python-docx to
+`tee-gateway`.
+
+**Dependency check (done).** The currently pinned versions already support
+standard image *and* file (PDF) content blocks with base64, across every
+provider we route to — so **this change needs no dependency bump and the PCR
+measurements stay stable**:
+
+| Package | Pinned | Native file/image support |
+|---|---|---|
+| `langchain-core` | 1.2.26 | Defines `ImageContentBlock` / `FileContentBlock` (base64, url, file_id, mime_type) |
+| `langchain-anthropic` | 1.4.0 | `file` → `document` (defaults `application/pdf`); image → base64 source |
+| `langchain-openai` | 1.1.12 | `file` → `file_data` data-URI / `input_file`; image → `image_url` |
+| `langchain-google-genai` | 4.2.1 | document/image blocks supported |
+| `langchain-xai` | 1.2.2 | subclass of `BaseChatOpenAI` → inherits OpenAI handling |
+
+This was verified functionally (not just by reading types) by running the
+Anthropic and OpenAI outbound message converters over a multimodal
+`HumanMessage`. Per-model *acceptance* of PDFs still depends on the model itself
+(see capability gating below).
+
+### 3. Per-provider capability gating
+
+Not every model accepts every modality. Extend `model_registry` with capability
+flags (e.g. `supports_image`, `supports_pdf`) and reject (clear 4xx inside the
+inner request) when a request sends a modality the target model can't handle,
+rather than silently dropping it as today.
+
+### 4. Request signing / hashing
+
+`chat_controller.py` (~645-651) hashes user content via `str(msg.content)`. With
+multimodal content that would hash megabytes of base64 and is not canonical.
+Define a stable hashing rule, e.g. hash each attachment as
+`sha256(mime_type || raw_bytes)` and include those digests (not the base64) in
+the canonical request JSON that feeds `keccak256(requestHash ...)`. This keeps
+signatures meaningful and bounded while still committing to the exact attachment
+content.
+
+### 5. Limits & validation
+
+- Hard cap on total attachment bytes per request (post-decode).
+- Allowlist of accepted mime types per modality.
+- Reject `image_url` values that are remote `https` URLs on the private path if
+  we want to guarantee the enclave makes no outbound fetch for user content
+  (Phase 1 = base64 only). Decide explicitly.
+
+## `chat-api` changes
+
+- OHTTP path: **no change needed** to the relay itself — attachments ride inside
+  the encrypted payload it already forwards opaquely.
+- Regular `POST /api/v1/chat` path: stop calling `load_documents` /
+  `is_image_url` and stop injecting extracted text. Either (a) build native
+  content parts here too, or (b) deprecate attachment support on the non-private
+  path and route all attachments through OHTTP. Recommend (b) for a single code
+  path.
+- The presigned-URL / `attachments: string[]` machinery and `attachments.py`
+  become dead code for inference and can be removed once Phase 1 ships (R2 may
+  still be used for chat-history storage — that is a separate concern and should
+  be client-side-encrypted if kept).
+
+## `chat-app` changes
+
+- Replace "upload to R2 → store presigned URL → send URL in `attachments`" with:
+  read the file in the browser, base64-encode, and add a native `image_url` /
+  `file` content part to the outgoing (to-be-encrypted) message.
+- Enforce client-side size/type limits matching the enclave caps; surface a clear
+  error when a file exceeds them.
+- Drop the presigned-upload/download hooks from the send path.
+
+## Rollout
+
+1. Enclave: `convert_messages` multimodal support + capability flags + hashing +
+   limits (behind the existing OHTTP path). Ship and verify PCRs.
+2. `chat-app`: send native base64 content parts on the OHTTP path.
+3. Remove server-side parsing from `chat-api`; retire `attachments.py` and the
+   presigned-URL attachment flow.
+4. (Optional, later) Phase 2 encrypted-R2-blob for large files.
+
+## Open questions
+
+- ~~Pinned `langchain-*` versions: do they already support `file` (PDF) content
+  blocks?~~ **Resolved:** yes, all five providers — no dep bump / PCR change
+  needed (see Dependency check above).
+- Hard size cap value for inline attachments, and the OHTTP request size ceiling.
+- Keep or drop attachment support entirely on the non-private path?
+- Source of truth for per-model `supports_image` / `supports_pdf` flags — note
+  `langchain-*` ships `ModelProfile` data (e.g. `langchain_xai/data/_profiles`)
+  that may already encode some of this.
diff --git a/tee_gateway/controllers/chat_controller.py b/tee_gateway/controllers/chat_controller.py
index 93f6523..609a157 100644
--- a/tee_gateway/controllers/chat_controller.py
+++ b/tee_gateway/controllers/chat_controller.py
@@ -1,3 +1,4 @@
+import hashlib
 import json
 import time
 import uuid
@@ -29,6 +30,9 @@
     get_chat_model_cached,
     convert_messages,
     extract_usage,
+    validate_attachments,
+    AttachmentValidationError,
+    _convert_content_part,
 )
 from tee_gateway.pricing import compute_session_cost
 
@@ -47,6 +51,13 @@ def create_chat_completion(body):
         connexion.request.get_json()
     )
 
+    # Reject attachments the target model can't handle, and enforce the size cap,
+    # before doing any provider work.
+    try:
+        validate_attachments(chat_request.messages, chat_request.model)
+    except AttachmentValidationError as e:
+        return {"error": "Invalid attachment", "message": str(e)}, e.status
+
     if chat_request.stream:
         return _create_streaming_response(chat_request)
     else:
@@ -636,6 +647,40 @@ def generate():
 # ---------------------------------------------------------------------------
 
 
+def _canonical_user_content(content) -> Any:
+    """Canonicalize user-message content for request hashing.
+
+    Plain-string content is returned unchanged. For multimodal content (a list of
+    parts), inline attachment bytes are replaced with a ``sha256`` digest so the
+    signed request commits to the exact attachment content without bloating the
+    hashed payload with megabytes of base64. URL / file_id references are kept
+    verbatim.
+    """
+    if isinstance(content, str):
+        return content
+    if not isinstance(content, list):
+        return str(content)
+
+    canonical = []
+    for part in content:
+        block = _convert_content_part(part)
+        if block is None:
+            continue
+        if block["type"] == "text":
+            canonical.append({"type": "text", "text": block.get("text", "")})
+            continue
+        entry = {"type": block["type"]}
+        if "base64" in block:
+            entry["sha256"] = hashlib.sha256(
+                block["base64"].encode("utf-8")
+            ).hexdigest()
+        for key in ("mime_type", "filename", "url", "file_id"):
+            if block.get(key):
+                entry[key] = block[key]
+        canonical.append(entry)
+    return canonical
+
+
 def _chat_request_to_dict(chat_request: CreateChatCompletionRequest) -> dict:
     """Serialize a CreateChatCompletionRequest to a canonical dict for hashing."""
     messages = []
@@ -646,9 +691,7 @@ def _chat_request_to_dict(chat_request: CreateChatCompletionRequest) -> dict:
             messages.append(
                 {
                     "role": "user",
-                    "content": msg.content
-                    if isinstance(msg.content, str)
-                    else str(msg.content),
+                    "content": _canonical_user_content(msg.content),
                 }
             )
         elif isinstance(msg, ChatCompletionRequestAssistantMessage):
diff --git a/tee_gateway/llm_backend.py b/tee_gateway/llm_backend.py
index 22eac59..0b71889 100644
--- a/tee_gateway/llm_backend.py
+++ b/tee_gateway/llm_backend.py
@@ -8,7 +8,7 @@
 
 import json
 import logging
-from typing import List, Dict, Optional, Any
+from typing import List, Dict, Optional, Any, Generator
 from functools import lru_cache
 
 import httpx
@@ -50,6 +50,11 @@
 # BytePlus ModelArk OpenAI-compatible endpoint (ap-southeast)
 BYTEDANCE_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/v3"
 
+# Hard cap on total inline (base64) attachment bytes per request, enforced
+# regardless of model. Inline base64 rides inside the encrypted payload, so this
+# bounds the request size the enclave will accept.
+MAX_ATTACHMENT_BYTES = 30 * 1024 * 1024  # 30 MB
+
 # Shared synchronous HTTP clients for each provider.
 # Initialized to None; built by set_provider_config() after key injection.
 openai_http_client: Optional[httpx.Client] = None
@@ -223,6 +228,207 @@ def get_chat_model_cached(model: str, temperature: float, max_tokens: int):
         raise ValueError(f"Unsupported provider: {provider}")
 
 
+def _parse_data_uri(uri: str) -> Optional[tuple[str, str]]:
+    """Parse a ``data:<mime>;base64,<data>`` URI into ``(mime_type, base64_data)``.
+
+    Returns ``None`` if the string is not a base64 data URI.
+    """
+    if not isinstance(uri, str) or not uri.startswith("data:"):
+        return None
+    try:
+        header, data = uri.split(",", 1)
+    except ValueError:
+        return None
+    if ";base64" not in header:
+        return None
+    mime_type = header[len("data:") :].split(";", 1)[0]
+    return mime_type, data
+
+
+def _convert_content_part(part: Any) -> Optional[Dict[str, Any]]:
+    """Convert one OpenAI-format content part into a LangChain v1 standard content
+    block (``text`` / ``image`` / ``file``).
+
+    The standard blocks (``langchain_core.messages.content``) are translated into
+    each provider's native API by the respective ``langchain-<provider>`` package,
+    so a single representation works for Anthropic, OpenAI, Gemini and xAI. Returns
+    ``None`` for empty or unrecognized parts.
+    """
+    if not isinstance(part, dict):
+        text = str(part)
+        return {"type": "text", "text": text} if text else None
+
+    ptype = part.get("type")
+
+    if ptype == "text":
+        text = part.get("text", "") or ""
+        return {"type": "text", "text": text} if text else None
+
+    if ptype in ("image_url", "image"):
+        image_url = part.get("image_url", part)
+        url = image_url.get("url") if isinstance(image_url, dict) else image_url
+        if not url:
+            # Already-standard image block carrying base64 directly.
+            if part.get("base64"):
+                block: Dict[str, Any] = {"type": "image", "base64": part["base64"]}
+                if part.get("mime_type"):
+                    block["mime_type"] = part["mime_type"]
+                return block
+            return None
+        parsed = _parse_data_uri(url)
+        if parsed:
+            mime_type, data = parsed
+            return {"type": "image", "base64": data, "mime_type": mime_type}
+        return {"type": "image", "url": url}
+
+    if ptype in ("file", "input_file"):
+        file_obj = part.get("file", part)
+        if not isinstance(file_obj, dict):
+            file_obj = {}
+        file_id = file_obj.get("file_id") or part.get("file_id")
+        if file_id:
+            return {"type": "file", "file_id": file_id}
+
+        filename = file_obj.get("filename") or part.get("filename")
+        file_data = (
+            file_obj.get("file_data") or file_obj.get("base64") or part.get("base64")
+        )
+        if file_data:
+            file_mime: Optional[str]
+            parsed_file = _parse_data_uri(file_data)
+            if parsed_file:
+                file_mime, file_b64 = parsed_file
+            else:
+                file_mime = part.get("mime_type") or file_obj.get("mime_type")
+                file_b64 = file_data
+            block = {"type": "file", "base64": file_b64}
+            if file_mime:
+                block["mime_type"] = file_mime
+            # OpenAI requires a filename for file uploads; carry it through so
+            # langchain-openai doesn't substitute a placeholder.
+            if filename:
+                block["filename"] = filename
+            return block
+
+        file_url = file_obj.get("file_url") or file_obj.get("url") or part.get("url")
+        if file_url:
+            block = {"type": "file", "url": file_url}
+            if filename:
+                block["filename"] = filename
+            return block
+        return None
+
+    # Unknown part type: best-effort text extraction.
+    text = part.get("text", "") or ""
+    return {"type": "text", "text": text} if text else None
+
+
+def _convert_user_content(content: Any) -> Any:
+    """Convert user-message content into a value accepted by ``HumanMessage``.
+
+    A list of OpenAI content parts becomes a list of LangChain standard content
+    blocks. When every part is text, it collapses back to a plain string so simple
+    requests stay simple (and to preserve prior behavior). Non-list content is
+    returned unchanged.
+    """
+    if not isinstance(content, list):
+        return content
+
+    blocks: List[Dict[str, Any]] = []
+    for part in content:
+        block = _convert_content_part(part)
+        if block is not None:
+            blocks.append(block)
+
+    if blocks and all(b["type"] == "text" for b in blocks):
+        return "".join(b["text"] for b in blocks)
+
+    return blocks
+
+
+class AttachmentValidationError(ValueError):
+    """Raised when a request's attachments violate model capabilities or size
+    limits. Carries the HTTP status the caller should return."""
+
+    def __init__(self, message: str, status: int = 400) -> None:
+        super().__init__(message)
+        self.status = status
+
+
+def _decoded_base64_len(b64: str) -> int:
+    """Length in bytes of base64-encoded data without decoding it."""
+    data = b64.split(",", 1)[-1]  # tolerate a leftover data: prefix
+    n = len(data)
+    padding = data[-2:].count("=") if n >= 2 else 0
+    return max((n * 3) // 4 - padding, 0)
+
+
+def get_model_capabilities(model: str) -> Dict[str, Any]:
+    """Return the LangChain capability profile for a model (``image_inputs``,
+    ``pdf_inputs``, ...), or ``{}`` when the model has no profile data.
+
+    Reads the public ``.profile`` attribute of the instantiated chat model, which
+    each ``langchain-<provider>`` package populates from maintained model data.
+    """
+    try:
+        chat = get_chat_model_cached(model, 0.0, 16)
+        return getattr(chat, "profile", None) or {}
+    except Exception:
+        return {}
+
+
+def _iter_content_parts(messages: list) -> Generator[Dict[str, Any], None, None]:
+    for msg in messages:
+        content = (
+            msg.get("content")
+            if isinstance(msg, dict)
+            else getattr(msg, "content", None)
+        )
+        if isinstance(content, list):
+            for part in content:
+                if isinstance(part, dict):
+                    yield part
+
+
+def validate_attachments(messages: list, model: str) -> None:
+    """Enforce per-model modality support and the inline attachment size cap.
+
+    Modality gating fails *open*: a modality is only rejected when the model's
+    profile explicitly marks it unsupported, so models without profile data are
+    never wrongly blocked (the provider would still reject a truly unsupported
+    combination). The size cap is a hard limit. Raises ``AttachmentValidationError``.
+    """
+    caps = get_model_capabilities(model)
+    image_supported = caps.get("image_inputs")
+    pdf_supported = caps.get("pdf_inputs")
+
+    total_bytes = 0
+    for part in _iter_content_parts(messages):
+        block = _convert_content_part(part)
+        if block is None:
+            continue
+        if block["type"] == "image":
+            if image_supported is False:
+                raise AttachmentValidationError(
+                    f"Model {model!r} does not support image attachments."
+                )
+            if "base64" in block:
+                total_bytes += _decoded_base64_len(block["base64"])
+        elif block["type"] == "file":
+            if pdf_supported is False:
+                raise AttachmentValidationError(
+                    f"Model {model!r} does not support document attachments."
+                )
+            if "base64" in block:
+                total_bytes += _decoded_base64_len(block["base64"])
+
+    if total_bytes > MAX_ATTACHMENT_BYTES:
+        raise AttachmentValidationError(
+            f"Attachments exceed the {MAX_ATTACHMENT_BYTES // (1024 * 1024)} MB limit.",
+            status=413,
+        )
+
+
 def convert_messages(messages: list) -> List[Any]:
     """Convert OpenAI-format message objects or dicts to LangChain message objects."""
     langchain_messages: List[BaseMessage] = []
@@ -246,13 +452,11 @@ def convert_messages(messages: list) -> List[Any]:
             langchain_messages.append(SystemMessage(content=content))
 
         elif role == "user":
-            # content may be a string or a list of content parts; handle both
-            if isinstance(content, list):
-                content = "".join(
-                    part.get("text", "") if isinstance(part, dict) else str(part)
-                    for part in content
-                )
-            langchain_messages.append(HumanMessage(content=content))
+            # content may be a string or a list of multimodal content parts
+            # (text / image / file); convert to native LangChain content blocks.
+            langchain_messages.append(
+                HumanMessage(content=_convert_user_content(content))
+            )
 
         elif role == "assistant":
             if tool_calls:
diff --git a/tee_gateway/test/test_tee_core.py b/tee_gateway/test/test_tee_core.py
index 41aac33..e4795f7 100644
--- a/tee_gateway/test/test_tee_core.py
+++ b/tee_gateway/test/test_tee_core.py
@@ -11,7 +11,10 @@
 """
 
 import base64
+import hashlib
+import json
 import unittest
+from unittest import mock
 
 from cryptography.hazmat.primitives import hashes
 from cryptography.hazmat.primitives.asymmetric import padding
@@ -19,7 +22,13 @@
 from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
 
 from tee_gateway import ohttp
-from tee_gateway.llm_backend import convert_messages, extract_usage
+from tee_gateway.controllers.chat_controller import _canonical_user_content
+from tee_gateway.llm_backend import (
+    AttachmentValidationError,
+    convert_messages,
+    extract_usage,
+    validate_attachments,
+)
 from tee_gateway.model_registry import get_model_config, get_rate_card
 from tee_gateway.tee_manager import (
     TEEKeyManager,
@@ -568,8 +577,8 @@ def test_multi_turn_order_preserved(self):
         self.assertIsInstance(result[1], HumanMessage)
         self.assertIsInstance(result[2], AIMessage)
 
-    def test_user_content_as_list_of_parts(self):
-        """Multimodal content parts should be concatenated into a single string."""
+    def test_user_content_text_only_parts_collapse_to_string(self):
+        """A list of text-only parts collapses back to a plain string."""
         result = convert_messages(
             [
                 {
@@ -584,6 +593,130 @@ def test_user_content_as_list_of_parts(self):
         self.assertIsInstance(result[0], HumanMessage)
         self.assertEqual(result[0].content, "Hello world")
 
+    def test_user_content_with_base64_image(self):
+        """An image_url data URI becomes a standard image content block, so the
+        image survives conversion instead of being dropped."""
+        result = convert_messages(
+            [
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": "What is this?"},
+                        {
+                            "type": "image_url",
+                            "image_url": {
+                                "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg=="
+                            },
+                        },
+                    ],
+                }
+            ]
+        )
+        content = result[0].content
+        self.assertIsInstance(content, list)
+        self.assertEqual(content[0], {"type": "text", "text": "What is this?"})
+        self.assertEqual(
+            content[1],
+            {
+                "type": "image",
+                "base64": "iVBORw0KGgoAAAANSUhEUg==",
+                "mime_type": "image/png",
+            },
+        )
+
+    def test_user_content_with_base64_pdf(self):
+        """A file part with a base64 PDF data URI becomes a standard file block,
+        carrying mime_type and the original filename through to the provider."""
+        result = convert_messages(
+            [
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": "Summarize this."},
+                        {
+                            "type": "file",
+                            "file": {
+                                "filename": "contract.pdf",
+                                "file_data": "data:application/pdf;base64,JVBERi0xLjQK",
+                            },
+                        },
+                    ],
+                }
+            ]
+        )
+        content = result[0].content
+        self.assertIsInstance(content, list)
+        self.assertEqual(
+            content[1],
+            {
+                "type": "file",
+                "base64": "JVBERi0xLjQK",
+                "mime_type": "application/pdf",
+                "filename": "contract.pdf",
+            },
+        )
+
+    def test_user_content_image_remote_url(self):
+        """A non-data-URI image URL is passed through as a url image block."""
+        result = convert_messages(
+            [
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "image_url",
+                            "image_url": {"url": "https://example.com/cat.png"},
+                        },
+                    ],
+                }
+            ]
+        )
+        self.assertEqual(
+            result[0].content,
+            [{"type": "image", "url": "https://example.com/cat.png"}],
+        )
+
+    def test_multimodal_blocks_convert_for_providers(self):
+        """The standard blocks produced here must be accepted by the provider
+        message converters — otherwise multimodal requests fail at send time.
+        This guards the cross-provider contract without needing network access."""
+        from langchain_anthropic.chat_models import _format_messages
+        from langchain_openai.chat_models.base import _convert_message_to_dict
+
+        msg = convert_messages(
+            [
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": "Read these."},
+                        {
+                            "type": "image_url",
+                            "image_url": {
+                                "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg=="
+                            },
+                        },
+                        {
+                            "type": "file",
+                            "file": {
+                                "filename": "doc.pdf",
+                                "file_data": "data:application/pdf;base64,JVBERi0xLjQK",
+                            },
+                        },
+                    ],
+                }
+            ]
+        )[0]
+
+        # Anthropic: file block -> document with application/pdf media type.
+        _system, anthropic_msgs = _format_messages([msg])
+        anthropic_types = {b["type"] for b in anthropic_msgs[0]["content"]}
+        self.assertEqual(anthropic_types, {"text", "image", "document"})
+
+        # OpenAI: file block -> file_data data URI.
+        openai_msg = _convert_message_to_dict(msg)
+        openai_types = {b["type"] for b in openai_msg["content"]}
+        self.assertEqual(openai_types, {"text", "image_url", "file"})
+
     def test_full_tool_call_conversation(self):
         """End-to-end multi-turn with tool use: user → assistant (tool call) → tool result."""
         msgs = [
@@ -613,6 +746,127 @@ def test_full_tool_call_conversation(self):
         self.assertEqual(result[2].tool_call_id, "call_xyz")
 
 
+# ---------------------------------------------------------------------------
+# llm_backend.validate_attachments
+# ---------------------------------------------------------------------------
+
+
+class TestValidateAttachments(unittest.TestCase):
+    """Attachment gating must reject modalities a model can't handle and enforce
+    the size cap, while never blocking a model whose capabilities are unknown."""
+
+    CAPS = "tee_gateway.llm_backend.get_model_capabilities"
+
+    @staticmethod
+    def _image_msg(b64):
+        return [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": f"data:image/png;base64,{b64}"},
+                    }
+                ],
+            }
+        ]
+
+    @staticmethod
+    def _pdf_msg(b64):
+        return [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "file",
+                        "file": {
+                            "filename": "a.pdf",
+                            "file_data": f"data:application/pdf;base64,{b64}",
+                        },
+                    }
+                ],
+            }
+        ]
+
+    def test_plain_text_request_passes(self):
+        # No model instantiation should be needed for a text-only request.
+        validate_attachments([{"role": "user", "content": "hi"}], "gpt-5")
+
+    def test_image_blocked_when_model_lacks_support(self):
+        with mock.patch(self.CAPS, return_value={"image_inputs": False}):
+            with self.assertRaises(AttachmentValidationError) as cm:
+                validate_attachments(self._image_msg("aGVsbG8="), "grok-4")
+        self.assertEqual(cm.exception.status, 400)
+
+    def test_image_allowed_when_model_supports(self):
+        with mock.patch(self.CAPS, return_value={"image_inputs": True}):
+            validate_attachments(self._image_msg("aGVsbG8="), "gpt-5")
+
+    def test_fails_open_when_profile_unknown(self):
+        # Empty profile (no capability data) must not block — provider decides.
+        with mock.patch(self.CAPS, return_value={}):
+            validate_attachments(self._image_msg("aGVsbG8="), "seed-2.0-lite")
+
+    def test_pdf_blocked_when_model_lacks_support(self):
+        with mock.patch(
+            self.CAPS, return_value={"image_inputs": True, "pdf_inputs": False}
+        ):
+            with self.assertRaises(AttachmentValidationError):
+                validate_attachments(self._pdf_msg("JVBERi0="), "grok-4")
+
+    def test_size_cap_enforced(self):
+        big = "A" * 1000  # ~750 decoded bytes
+        with (
+            mock.patch(self.CAPS, return_value={"image_inputs": True}),
+            mock.patch("tee_gateway.llm_backend.MAX_ATTACHMENT_BYTES", 100),
+        ):
+            with self.assertRaises(AttachmentValidationError) as cm:
+                validate_attachments(self._image_msg(big), "gpt-5")
+        self.assertEqual(cm.exception.status, 413)
+
+
+# ---------------------------------------------------------------------------
+# chat_controller._canonical_user_content (request-hashing canonicalization)
+# ---------------------------------------------------------------------------
+
+
+class TestCanonicalUserContent(unittest.TestCase):
+    """The signed request commits to attachments via digest, never inlining the
+    base64 — otherwise the hash payload bloats and signatures become unwieldy."""
+
+    def test_string_content_passthrough(self):
+        self.assertEqual(_canonical_user_content("hello"), "hello")
+
+    def test_attachment_digested_not_inlined(self):
+        content = [
+            {"type": "text", "text": "summarize"},
+            {
+                "type": "file",
+                "file": {
+                    "filename": "a.pdf",
+                    "file_data": "data:application/pdf;base64,JVBERi0xLjQK",
+                },
+            },
+        ]
+        out = _canonical_user_content(content)
+        self.assertEqual(out[0], {"type": "text", "text": "summarize"})
+        entry = out[1]
+        self.assertEqual(entry["type"], "file")
+        self.assertEqual(entry["mime_type"], "application/pdf")
+        self.assertEqual(entry["filename"], "a.pdf")
+        self.assertEqual(entry["sha256"], hashlib.sha256(b"JVBERi0xLjQK").hexdigest())
+        # The raw base64 must not appear anywhere in the hashed payload.
+        self.assertNotIn("JVBERi0xLjQK", json.dumps(out))
+
+    def test_deterministic(self):
+        content = [
+            {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBOR=="}}
+        ]
+        self.assertEqual(
+            _canonical_user_content(content), _canonical_user_content(content)
+        )
+
+
 # ---------------------------------------------------------------------------
 # llm_backend.extract_usage
 # ---------------------------------------------------------------------------