[Enhancement]: Add OpenClaw LLM execution middleware for provider calls

### Affected area

Plugins, Observability or exporters, Third-party integration patches

### Problem or opportunity

### TL;DR

Request OpenClaw support for one invocation-scoped LLM execution middleware around each provider/model call. OpenClaw already has most of the needed data internally; this issue asks to expose it through a stable public provider-call boundary so NeMo Flow can build authoritative Phoenix/OpenInference spans for security, optimization, and observability without patching OpenClaw internals or guessing from separate hooks.

The middleware should be additive, provider-neutral, privacy-aware, and backward-compatible with existing plugin hooks. Observability should be fail-open by default; blocking, rewriting, routing, or annotation should be explicit policy-controlled behavior.

### Related

* Supersedes NVIDIA/NeMo-Flow#75
* Based on NeMo Flow PR NVIDIA/NeMo-Flow#67
* Addresses Will's middleware direction from NVIDIA/NeMo-Flow#75

### Problem

The NeMo Flow OpenClaw plugin can produce useful Phoenix/OpenInference traces from current public hooks, but those traces are not always authoritative optimization evidence.

Current hooks expose related signals separately:

* `model_call_started` / `model_call_ended`: `callId`, provider/model/API/transport, duration, TTFB, byte counts, and error category
* `llm_input`: prompt, system prompt, history snapshot, and image count
* `llm_output`: assistant text and accumulated usage
* message/tool hooks: assistant/tool side effects after provider output has been mapped into session messages
* trajectory metadata: run-level session, agent, provider, model API, config, plugins, redaction policy, final usage, and prompt-cache artifacts

Those are useful pieces, but they are not one provider-call contract. In multi-step loops such as `LLM -> tool -> LLM -> tool -> LLM`, one run can contain several provider calls and tool calls. Pairing request, response, usage, timing, tool-call metadata, and final output by ordering is ambiguous with streaming, retries, fallbacks, compaction retries, or concurrent tool activity.

PR NVIDIA/NeMo-Flow#67 is intentionally conservative: when timing cannot be paired safely, the plugin emits diagnostic marks instead of inventing latency. That is correct for a hook-based integration, but it means token, cache, cost, latency, TTFB, retry/fallback state, and model-emitted tool-call metadata do not always stay attached to the provider invocation that produced them.

### Proposed enhancement

OpenClaw already has most of the required data internally:

* model-call diagnostics create `callId` and record timing, TTFB, byte counts, provider/model/API/transport, and failure category
* provider transports build normalized requests and parse response events, finish reasons, response ids, tool calls, usage, cache counters, and cost
* prompt-cache observability and trajectory metadata capture request-shape, run/session/agent/config/plugin, and redaction context
* run attempt metadata and failover logging represent retries, profile rotation, fallback decisions, and error/status details
* PR NVIDIA/NeMo-Flow#67 adds bounded correlation, placeholder replay, ambiguity/unpaired timing marks, fail-open replay handling, and session-end draining

The NeMo Flow eval patches show why provider-call fidelity matters:

* Provider cache evidence is API-surface specific. OpenAI-compatible routes expose cache reuse through `usage.prompt_tokens_details.cached_tokens` or `usage.input_tokens_details.cached_tokens`; Anthropic Messages routes expose `usage.cache_read_input_tokens` and `usage.cache_creation_input_tokens`.
* Cache mode follows the provider API surface, not only the model family string. A routed Anthropic model on an OpenAI-compatible endpoint needs OpenAI-style cache evidence, while a native Anthropic Messages route needs Anthropic-style cache evidence.
* The patched codec path had to emit provider-native usage for `openai_chat`, `openai_responses`, and `anthropic_messages`; otherwise Phoenix/OpenInference output could not prove provider cache behavior.
* ACG and tool-policy optimization need stable request-shape identifiers, effective tool-schema evidence, and a defined telemetry completion point. Volatile task text is not a reliable key and may be redacted.

Request OpenClaw support for a public LLM execution middleware that wraps each provider/model invocation.

The middleware should be invocation-scoped, not only post-hoc. It should allow a plugin to observe the request before execution and the response or failure after execution. Where OpenClaw policy allows, the same shape should support blocking, rewriting, routing, or annotating the call.

The proposed middleware should compose OpenClaw's existing internal provider-call data into one invocation-scoped public record. It should not expose trajectory metadata wholesale as a plugin API; trajectory metadata is broader and run-scoped.

### Runtime contract and binding impact

Control semantics should be explicit:

* `before`: called after provider/model/API/transport resolution and normalized request construction, before dispatch
* `chunk`: optional streaming callback with sanitized provider/native chunk information or normalized chunk metadata
* `after`: called once for a successful invocation with final response, usage, cost, timing, finish reason, and model-emitted tool-call metadata
* `error`: called once for a failed invocation with error/status metadata, elapsed timing, retry/fallback metadata, and any known partial usage/cost

The same `callId` must be present across all phases, and each dispatched invocation should emit exactly one terminal phase: `after` or `error`. If routing changes provider/model/API, OpenClaw should rebuild the provider request before dispatch.

For NeMo Flow to treat plugin traces as authoritative, the middleware context needs:

* stable invocation id / `callId`
* optional logical call id when retries/fallbacks belong to one higher-level agent request
* retry/fallback attempt metadata
* run/session/agent context
* provider, model, codec/API/transport, and request-surface metadata
* effective tool schema/inventory metadata or a stable fingerprint
* normalized provider request before execution
* final normalized LLM response envelope after execution, including streaming chunks or an accumulated final response
* provider-native usage before normalization, including OpenAI-compatible and Anthropic Messages cache fields
* normalized input/output/total tokens, cache read/write counters, and cost when available
* start/end timing, latency, and TTFB when available
* finish reason and model-emitted tool-call metadata
* failure/error metadata for provider exceptions
* sanitized raw payloads where allowed by OpenClaw privacy policy

Design constraints:

* Keep the public API provider-neutral while preserving provider-native usage under a structured field.
* Expose stable request-shape metadata for optimization without requiring volatile prompt text.
* Treat raw request/response payloads as policy-gated diagnostic data.
* Keep existing hooks backward-compatible.
* Run at the transport/provider boundary, not final assistant-message replay.
* Preserve fail-open behavior for observability plugins unless a plugin is explicitly configured for blocking/security behavior.
* Define a telemetry completion point so short-lived runs can export final provider-call evidence before shutdown.

Binding impact:

* No required change to existing plugin hooks if this is added as a new middleware capability.
* Existing plugins can ignore this middleware and continue using current hooks.
* NeMo Flow would bind to the middleware and map each provider call directly to one Phoenix/OpenInference LLM span.
* This should reduce or remove the current best-effort correlation logic in the NeMo Flow OpenClaw plugin.

### Alternatives considered

* **Original** NVIDIA/NeMo-Flow#75 **provider-call lifecycle hooks:** workable, but a single middleware supersedes the multi-hook proposal if it exposes the same invocation-scoped data and optional control. See NVIDIA/NeMo-Flow#75)
* **Best-effort reconstruction from existing hooks, as in PR** NVIDIA/NeMo-Flow#67 **:** It uses bounded hook replay today and is the right compatibility path for current OpenClaw releases. It is still lossy and should not be the long-term source-of-truth contract. See PR NVIDIA/NeMo-Flow#67)
* **Expose trajectory metadata directly:** useful run-level context, but too broad and not invocation-scoped.
* **Continue patching OpenClaw internals:** accurate for prototypes, but unstable across OpenClaw releases.

### Acceptance criteria

* OpenClaw exposes a public LLM execution middleware around provider/model invocations.
* Each dispatched invocation has a stable `callId`; retries/fallbacks are distinguishable and can share an optional logical call id.
* Middleware can observe normalized provider requests before execution and final normalized response envelopes after execution for streaming and non-streaming calls.
* The middleware provides one provider-call boundary that is sufficient for security, optimization, and observability use cases without requiring separate provider-call lifecycle hooks.
* Completion data preserves provider-native usage before normalization, including OpenAI-compatible cached-token fields and Anthropic Messages cache read/write fields.
* Completion data exposes normalized tokens, cache counters, cost when available, finish reason, latency, TTFB, and model-emitted tool-call metadata.
* Failure data exposes error type/status, elapsed timing, retry/fallback metadata, and known usage/cost for failed attempts.
* Payloads follow OpenClaw privacy/redaction policy and do not expose secrets.
* Observation failures are isolated from model execution by default.
* Existing plugin hooks remain backward-compatible.
* A NeMo Flow plugin can map each provider call directly to one Phoenix/OpenInference LLM span without message-order or timing-candidate heuristics.
* A multi-step agent loop can produce an accurate `LLM -> tool -> LLM -> tool -> LLM` trace with correct token/cache/cost attribution per LLM span.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Add OpenClaw LLM execution middleware for provider calls #78

Affected area

Problem or opportunity

TL;DR

Related

Problem

Proposed enhancement

Runtime contract and binding impact

Alternatives considered

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement]: Add OpenClaw LLM execution middleware for provider calls #78

Description

Affected area

Problem or opportunity

TL;DR

Related

Problem

Proposed enhancement

Runtime contract and binding impact

Alternatives considered

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions