You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Plugins, Observability or exporters, Third-party integration patches
Problem or opportunity
TL;DR
Request OpenClaw support for one invocation-scoped LLM execution middleware around each provider/model call. OpenClaw already has most of the needed data internally; this issue asks to expose it through a stable public provider-call boundary so NeMo Flow can build authoritative Phoenix/OpenInference spans for security, optimization, and observability without patching OpenClaw internals or guessing from separate hooks.
The middleware should be additive, provider-neutral, privacy-aware, and backward-compatible with existing plugin hooks. Observability should be fail-open by default; blocking, rewriting, routing, or annotation should be explicit policy-controlled behavior.
The NeMo Flow OpenClaw plugin can produce useful Phoenix/OpenInference traces from current public hooks, but those traces are not always authoritative optimization evidence.
llm_input: prompt, system prompt, history snapshot, and image count
llm_output: assistant text and accumulated usage
message/tool hooks: assistant/tool side effects after provider output has been mapped into session messages
trajectory metadata: run-level session, agent, provider, model API, config, plugins, redaction policy, final usage, and prompt-cache artifacts
Those are useful pieces, but they are not one provider-call contract. In multi-step loops such as LLM -> tool -> LLM -> tool -> LLM, one run can contain several provider calls and tool calls. Pairing request, response, usage, timing, tool-call metadata, and final output by ordering is ambiguous with streaming, retries, fallbacks, compaction retries, or concurrent tool activity.
PR #67 is intentionally conservative: when timing cannot be paired safely, the plugin emits diagnostic marks instead of inventing latency. That is correct for a hook-based integration, but it means token, cache, cost, latency, TTFB, retry/fallback state, and model-emitted tool-call metadata do not always stay attached to the provider invocation that produced them.
Proposed enhancement
OpenClaw already has most of the required data internally:
model-call diagnostics create callId and record timing, TTFB, byte counts, provider/model/API/transport, and failure category
provider transports build normalized requests and parse response events, finish reasons, response ids, tool calls, usage, cache counters, and cost
prompt-cache observability and trajectory metadata capture request-shape, run/session/agent/config/plugin, and redaction context
run attempt metadata and failover logging represent retries, profile rotation, fallback decisions, and error/status details
The NeMo Flow eval patches show why provider-call fidelity matters:
Provider cache evidence is API-surface specific. OpenAI-compatible routes expose cache reuse through usage.prompt_tokens_details.cached_tokens or usage.input_tokens_details.cached_tokens; Anthropic Messages routes expose usage.cache_read_input_tokens and usage.cache_creation_input_tokens.
Cache mode follows the provider API surface, not only the model family string. A routed Anthropic model on an OpenAI-compatible endpoint needs OpenAI-style cache evidence, while a native Anthropic Messages route needs Anthropic-style cache evidence.
The patched codec path had to emit provider-native usage for openai_chat, openai_responses, and anthropic_messages; otherwise Phoenix/OpenInference output could not prove provider cache behavior.
ACG and tool-policy optimization need stable request-shape identifiers, effective tool-schema evidence, and a defined telemetry completion point. Volatile task text is not a reliable key and may be redacted.
Request OpenClaw support for a public LLM execution middleware that wraps each provider/model invocation.
The middleware should be invocation-scoped, not only post-hoc. It should allow a plugin to observe the request before execution and the response or failure after execution. Where OpenClaw policy allows, the same shape should support blocking, rewriting, routing, or annotating the call.
The proposed middleware should compose OpenClaw's existing internal provider-call data into one invocation-scoped public record. It should not expose trajectory metadata wholesale as a plugin API; trajectory metadata is broader and run-scoped.
Runtime contract and binding impact
Control semantics should be explicit:
before: called after provider/model/API/transport resolution and normalized request construction, before dispatch
chunk: optional streaming callback with sanitized provider/native chunk information or normalized chunk metadata
after: called once for a successful invocation with final response, usage, cost, timing, finish reason, and model-emitted tool-call metadata
error: called once for a failed invocation with error/status metadata, elapsed timing, retry/fallback metadata, and any known partial usage/cost
The same callId must be present across all phases, and each dispatched invocation should emit exactly one terminal phase: after or error. If routing changes provider/model/API, OpenClaw should rebuild the provider request before dispatch.
For NeMo Flow to treat plugin traces as authoritative, the middleware context needs:
stable invocation id / callId
optional logical call id when retries/fallbacks belong to one higher-level agent request
retry/fallback attempt metadata
run/session/agent context
provider, model, codec/API/transport, and request-surface metadata
effective tool schema/inventory metadata or a stable fingerprint
normalized provider request before execution
final normalized LLM response envelope after execution, including streaming chunks or an accumulated final response
provider-native usage before normalization, including OpenAI-compatible and Anthropic Messages cache fields
normalized input/output/total tokens, cache read/write counters, and cost when available
start/end timing, latency, and TTFB when available
finish reason and model-emitted tool-call metadata
failure/error metadata for provider exceptions
sanitized raw payloads where allowed by OpenClaw privacy policy
Design constraints:
Keep the public API provider-neutral while preserving provider-native usage under a structured field.
Expose stable request-shape metadata for optimization without requiring volatile prompt text.
Treat raw request/response payloads as policy-gated diagnostic data.
Keep existing hooks backward-compatible.
Run at the transport/provider boundary, not final assistant-message replay.
Preserve fail-open behavior for observability plugins unless a plugin is explicitly configured for blocking/security behavior.
Define a telemetry completion point so short-lived runs can export final provider-call evidence before shutdown.
Binding impact:
No required change to existing plugin hooks if this is added as a new middleware capability.
Existing plugins can ignore this middleware and continue using current hooks.
NeMo Flow would bind to the middleware and map each provider call directly to one Phoenix/OpenInference LLM span.
This should reduce or remove the current best-effort correlation logic in the NeMo Flow OpenClaw plugin.
Expose trajectory metadata directly: useful run-level context, but too broad and not invocation-scoped.
Continue patching OpenClaw internals: accurate for prototypes, but unstable across OpenClaw releases.
Acceptance criteria
OpenClaw exposes a public LLM execution middleware around provider/model invocations.
Each dispatched invocation has a stable callId; retries/fallbacks are distinguishable and can share an optional logical call id.
Middleware can observe normalized provider requests before execution and final normalized response envelopes after execution for streaming and non-streaming calls.
The middleware provides one provider-call boundary that is sufficient for security, optimization, and observability use cases without requiring separate provider-call lifecycle hooks.
Completion data preserves provider-native usage before normalization, including OpenAI-compatible cached-token fields and Anthropic Messages cache read/write fields.
Completion data exposes normalized tokens, cache counters, cost when available, finish reason, latency, TTFB, and model-emitted tool-call metadata.
Failure data exposes error type/status, elapsed timing, retry/fallback metadata, and known usage/cost for failed attempts.
Payloads follow OpenClaw privacy/redaction policy and do not expose secrets.
Observation failures are isolated from model execution by default.
Existing plugin hooks remain backward-compatible.
A NeMo Flow plugin can map each provider call directly to one Phoenix/OpenInference LLM span without message-order or timing-candidate heuristics.
A multi-step agent loop can produce an accurate LLM -> tool -> LLM -> tool -> LLM trace with correct token/cache/cost attribution per LLM span.
Affected area
Plugins, Observability or exporters, Third-party integration patches
Problem or opportunity
TL;DR
Request OpenClaw support for one invocation-scoped LLM execution middleware around each provider/model call. OpenClaw already has most of the needed data internally; this issue asks to expose it through a stable public provider-call boundary so NeMo Flow can build authoritative Phoenix/OpenInference spans for security, optimization, and observability without patching OpenClaw internals or guessing from separate hooks.
The middleware should be additive, provider-neutral, privacy-aware, and backward-compatible with existing plugin hooks. Observability should be fail-open by default; blocking, rewriting, routing, or annotation should be explicit policy-controlled behavior.
Related
Problem
The NeMo Flow OpenClaw plugin can produce useful Phoenix/OpenInference traces from current public hooks, but those traces are not always authoritative optimization evidence.
Current hooks expose related signals separately:
model_call_started/model_call_ended:callId, provider/model/API/transport, duration, TTFB, byte counts, and error categoryllm_input: prompt, system prompt, history snapshot, and image countllm_output: assistant text and accumulated usageThose are useful pieces, but they are not one provider-call contract. In multi-step loops such as
LLM -> tool -> LLM -> tool -> LLM, one run can contain several provider calls and tool calls. Pairing request, response, usage, timing, tool-call metadata, and final output by ordering is ambiguous with streaming, retries, fallbacks, compaction retries, or concurrent tool activity.PR #67 is intentionally conservative: when timing cannot be paired safely, the plugin emits diagnostic marks instead of inventing latency. That is correct for a hook-based integration, but it means token, cache, cost, latency, TTFB, retry/fallback state, and model-emitted tool-call metadata do not always stay attached to the provider invocation that produced them.
Proposed enhancement
OpenClaw already has most of the required data internally:
callIdand record timing, TTFB, byte counts, provider/model/API/transport, and failure categoryThe NeMo Flow eval patches show why provider-call fidelity matters:
usage.prompt_tokens_details.cached_tokensorusage.input_tokens_details.cached_tokens; Anthropic Messages routes exposeusage.cache_read_input_tokensandusage.cache_creation_input_tokens.openai_chat,openai_responses, andanthropic_messages; otherwise Phoenix/OpenInference output could not prove provider cache behavior.Request OpenClaw support for a public LLM execution middleware that wraps each provider/model invocation.
The middleware should be invocation-scoped, not only post-hoc. It should allow a plugin to observe the request before execution and the response or failure after execution. Where OpenClaw policy allows, the same shape should support blocking, rewriting, routing, or annotating the call.
The proposed middleware should compose OpenClaw's existing internal provider-call data into one invocation-scoped public record. It should not expose trajectory metadata wholesale as a plugin API; trajectory metadata is broader and run-scoped.
Runtime contract and binding impact
Control semantics should be explicit:
before: called after provider/model/API/transport resolution and normalized request construction, before dispatchchunk: optional streaming callback with sanitized provider/native chunk information or normalized chunk metadataafter: called once for a successful invocation with final response, usage, cost, timing, finish reason, and model-emitted tool-call metadataerror: called once for a failed invocation with error/status metadata, elapsed timing, retry/fallback metadata, and any known partial usage/costThe same
callIdmust be present across all phases, and each dispatched invocation should emit exactly one terminal phase:afterorerror. If routing changes provider/model/API, OpenClaw should rebuild the provider request before dispatch.For NeMo Flow to treat plugin traces as authoritative, the middleware context needs:
callIdDesign constraints:
Binding impact:
Alternatives considered
Acceptance criteria
callId; retries/fallbacks are distinguishable and can share an optional logical call id.LLM -> tool -> LLM -> tool -> LLMtrace with correct token/cache/cost attribution per LLM span.