feat | instrumentation engine, attestation, and adapter integrations#88
feat | instrumentation engine, attestation, and adapter integrations#88garrettallen14 wants to merge 38 commits into
Conversation
…ing (#83) * feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai
* feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai * feat: new adapters
feat | agentforce, agno, autogen, bedrock adapters
… into development
Brings in the SDK samples overhaul (70+ samples), the auto-release
workflow, CHANGELOG, the custom-model update/delete API, and the
restored test files we'd lost on this branch (test_samples.py,
test_samples_e2e.py, test_mcp_server.py, etc.).
pyproject.toml conflicted in the tests/** per-file-ignores section -- I
kept main's broader set (T201, T203, ARG, B007) so the restored
test_samples_e2e.py keeps linting, and dropped my orphan
examples/instrument_{openai,langchain}.py files in favour of main's
renamed samples/integrations/{openai,langchain}_instrumented.py.
tests/conftest.py auto-merged main's "live" pytest marker.
Mirrors the _FRAMEWORK_PACKAGES pattern from ateam without dragging in the singleton machinery. discover_installed() uses importlib.util find_spec so detection is cheap and has no import side effects; auto(client) instantiates and connects whichever frameworks are importable in the current env. Providers stay explicit since they need the user's SDK client, which we don't have at auto() time. Same for agentforce and langfuse, which need credentials at connect. Both helpers are re-exported from layerlens.instrument. Drift-guard tests pin the three lookup tables to stay consistent.
After every node exit we hash the output state and emit the digest as agent.state.change, so the dashboard can diff state across nodes without needing the raw payloads. Uses the same compute_hash as the attestation chain so the format matches. Constructor knobs: - emit_state_hash=False to turn it off entirely - state_include_keys / state_exclude_keys to scope the hash to a subset of the state dict Non-serialisable state falls back to a repr-based hash so we still emit something stable. agent.state.change is in _ALWAYS_ENABLED, so no layer gating needed.
Whenever the active langgraph_node transitions between distinct named agents we emit agent.handoff. That puts LangGraph in line with the OpenAI Agents and Google ADK adapters, which already detect handoffs natively. HandoffDetector is intentionally framework-agnostic -- I'll reuse it for the CrewAI delegation work next. Same-node revisits and the first node observed don't emit, so the noise stays low. Context gets scrubbed through the same allow-list ateam uses (task, messages, objective, etc.) with long strings truncated and long lists collapsed to placeholders, then hashed so dashboards can correlate handoffs without seeing the raw state.
Two related pieces. inject_headers / extract_headers let user code stitch our traces into a wider distributed-tracing system. If OpenTelemetry is installed we delegate to its propagator; otherwise we build traceparent by hand from the active TraceCollector and current span. Our 16-hex trace ids get zero-padded to 32 hex on the wire and shortened back on extract. gen_ai_attributes() returns a dict of OTel GenAI semconv attributes (gen_ai.system, gen_ai.operation.name, request params, response model / id / finish_reasons, usage tokens). The provider emit helper now embeds this under otel_gen_ai on every model.invoke, so OTel-aware tooling can read the standard names without having to re-map our internal field names.
Hierarchical crews delegate work through the built-in "Delegate work to coworker" and "Ask question to coworker" tools, but older crewai versions don't fire AgentDelegationStartedEvent for them. That left the handoff invisible in our traces. The tool-call path now matches those tool names case-insensitively and synthesises agent.handoff with from_agent (current agent role), to_agent (coworker arg), tool_name, a sequence number, and a sha256 hash over the scrubbed task+context. The typed-event handler bumps the same sequence so newer crewai versions emit identical payloads. tool_args is parsed robustly -- crewai sometimes passes it as a dict, sometimes as a JSON string. Context scrubbing reuses _handoff.scrub_context for parity with the LangGraph handoff format.
Wraps semantic-kernel's AgentChat / AgentGroupChat invoke (an async generator that yields ChatMessageContent) and processes each yielded message for: - tool calls / results from FunctionCall items - model.invoke + cost.record derived from message.metadata - agent.handoff on agent_name turn transitions, via the shared HandoffDetector A one-shot environment.config event fires per chat instance on its first invocation, capturing the chat type, agents, plugins, and selection / termination strategy class names. Provider detection covers the usual suspects (gpt/o1/o3 -> openai, claude -> anthropic, gemini -> google, etc.) and falls back to azure_openai, since that's what MS Agent Framework fronts most of the time. Registered in the auto-detection tables, so layerlens.instrument.auto() picks it up when semantic-kernel is installed. Coexists fine with the existing SemanticKernelAdapter -- they instrument different surfaces (filters vs AgentChat wrapping).
EmbeddingAdapter wraps OpenAI / Cohere / sentence-transformers and emits embedding.create with provider, model, batch size, vector dimensions, token usage, and latency. Pass-through when no collector is active so it adds no overhead outside a trace. VectorStoreAdapter does the same for Pinecone, Chroma, and Weaviate (near_vector / near_text). retrieval.query events carry the query shape, result count, and a min/max/mean over scores or distances. BenchmarkImporter lives under layerlens.benchmarks rather than the adapters tree -- it's a data-conversion utility, not an instrumentation tracer (ateam's own docstring flagged the naming inconsistency in their version). Reads HuggingFace Datasets, HELM result JSON, CSV, JSON arrays, and JSONL. Optional schema_mapping renames source fields to layerlens canonical names.
TracedMemory is a transparent proxy around any LangChain memory object. save_context and clear are intercepted; before the call we hash the memory's loaded variables, after the call we hash again, and if the hash changes we emit agent.state.change. Everything else passes through. For workflows where save_context happens outside our control (e.g. inside a third-party agent), MemoryMutationTracker is a context manager that frames a logical operation and emits one event per logical operation rather than one per save_context call. Hashing uses the same compute_hash as the attestation chain, so before/after digests are comparable across the LangChain and LangGraph adapters. Non-serialisable memory contents fall back to a repr-based hash so we still get a stable identifier. Exported as wrap_memory / TracedMemory / MemoryMutationTracker from layerlens.instrument.adapters.frameworks.langchain.
The existing layerlens.replay subpackage already drives full replay via ReplayController; this fills in the missing persistence piece so a TraceCollector can round-trip through disk. TraceCollector.to_replay_dict() returns the same payload that flush() uploads (trace_id, events, capture_config, attestation), but without sealing the hash chain -- the collector stays usable for further emits. _build_trace_payload now takes a seal flag; flush() still seals, to_replay_dict doesn't. New layerlens.replay.snapshot module: - dump / dump_collector / load_snapshot for the file-IO side - replay_events to re-emit captured events into a fresh collector - serialize_adapter mirrors the per-adapter serialize_for_replay pattern from ateam, bundling AdapterInfo + current trace into one dict
GA check for protocol adapter classes. Verifies the class extends BaseProtocolAdapter, sets non-empty PROTOCOL and PROTOCOL_VERSION, implements connect / disconnect / adapter_info, returns the right types from adapter_info() (AdapterInfo with adapter_type="protocol") and probe_health() (ProtocolHealth), and that negotiate_version picks an exact match when offered. Result types are JSON-serialisable dataclasses; failures are partitioned by severity, so "couldn't instantiate the class to check runtime shape" surfaces as a warning while contract violations are errors. Runs against the three shipped shim adapters (a2ui, ap2, ucp) in a parametrised test, so a regression in any of them surfaces on the next run. Also has to defensively ensure an asyncio event loop exists before instantiation -- BaseProtocolAdapter creates an asyncio.Semaphore in __init__ and the suite would otherwise break when run after asyncio-heavy tests that closed their loop.
Two coupled changes that have to land together because either alone breaks rye run lint. ~265 files reformatted by ruff-format. These accumulated as the pre-commit hook's format pass touched the broader repo after main's 70+ new samples and the restored test files came in. Format-only -- no semantic edits. The formatter also wrapped three suppressions onto the wrong line, which I had to put back: - probe_health's # noqa: ARG002 ended up on the return-type line instead of the line declaring the unused `endpoint` arg. - langchain_core's BaseCallbackHandler import got wrapped onto multiple lines, leaving the # pyright: ignore on the closing paren where pyright doesn't honour it. Pinned to a one-liner inside # fmt: off/on. - ProtocolCertificationSuite._safe_instantiate took a parameter named `cls`, which pyright reserves for classmethods. Renamed to target_cls. Also added .venv* to .gitignore so locally-created Python alt envs don't show up in status.
Newer crewai inspects each handler's parameter count and passes a third `state` positional when there are 3 params. We were using `def _handler(source, event, _m=method)` — i.e. a default-arg closure to capture the bound method — which crewai then clobbered by passing state as the third arg, leaving _m = state (not callable) and the handler raising 'NoneType' object is not callable. Switched to a factory closure (`_make_handler(target)`) so the visible signature is exactly (source, event) — crewai takes the 2-arg path and the bound method is captured in the closure properly. Surfaced by the new tests/e2e CrewAI delegation tests under Python 3.11 with crewai 1.14, where the real event bus dispatches handlers through a ThreadPoolExecutor. The existing unit tests didn't catch it because they invoke the adapter's _on_* methods directly rather than going through the event bus.
stepdi
left a comment
There was a problem hiding this comment.
Summary
Substantial implementation across instrumentation, attestation, replay, and adapter integrations (~43k LOC, 30 commits). Cross-cutting code quality is unusually clean — zero hallucinated imports across 359 from layerlens.* references (verified — every target module exists in the repo), zero TODO/FIXME/HACK/XXX markers in src/, 1.19× test-to-source LOC ratio (33,433 / 28,174), all framework/provider/protocol extras properly tiered in pyproject.toml. The _base_provider.py / _base_framework.py hierarchies are well-designed and reduce per-adapter duplication.
Before merge I'd want six items addressed below — three blockers, three scope/naming decisions.
Blockers
1. CI red on Python 3.10 / 3.11 / 3.12 (3.9 passes)
Four distinct failures, all targeted:
test_openai_agents.py— 17+ failures withTypeError: SpanImpl.__init__() got an unexpected keyword argument 'tracing_api_key'. Upstreamopenai-agentsSDK API change. Pin a compatible version or update the test fixture.test_e2e_crewai_delegation.py::test_chain_of_delegations_keeps_sequence—assert len(handoffs) == 3 → 2. Off-by-one in chain-delegation detection.- SemanticKernel plugin detection —
assert "MathPlugin" in plugin_names → 'MathPlugin' in set(). Likely version-pin sensitive. - Test isolation —
assert _current_collector.get() is None → <TraceCollector object ...>.ContextVarnot cleaned between tests.
2. ms_agent_framework.py imports semantic_kernel, not agent-framework
ms_agent_framework.py:33doesimport semantic_kernel.:53setspackage = "semantic-kernel".- Class docstring (
:41): "Microsoft Agent Framework (semantic-kernel agents)." _registry.py:35-38acknowledges: "MS Agent Framework ships as part of semantic-kernel; we share the detection key. Both adapters can coexist — they instrument different surface areas (filters vs AgentChat wrapping)."
Two different PyPI packages exist:
semantic-kernel— already instrumented separately bysemantic_kernel.py.agent-framework— v1.4.0 on PyPI ("Microsoft Agent Framework for building AI Agents with Python"). Not currently instrumented.
Two options: rename module/class to semantic_kernel_agents to honestly describe what it instruments, or replace semantic_kernel imports with agent_framework.
3. Bedrock streaming + tool-call extraction incomplete
bedrock.py declares streaming support by wrapping methods (:63-70) but emits a placeholder event extra={"streaming": True, "method": method} (:187) and never aggregates chunks. The module docstring (lines 7-9) is explicit about the StreamingBody single-read constraint, but the practical effect: customers running Bedrock streaming get traces with no content, no usage, no cost.
Separately, _extract_invoke_output (:241-263) and _extract_converse_output (:284-292) both filter to text-only blocks ("text" in block), dropping tool_use blocks entirely. Direct Anthropic adapter handles tool_use (anthropic.py:48, 51, 112, 116, 285, 312); the parsing could be lifted for Bedrock-Anthropic invoke and Converse toolUse content blocks.
Also: bedrock.py:41 inherits from BaseAdapter (not MonkeyPatchProvider like other providers), re-implements emission, and reaches a private helper via from ._emit_helpers import _emit_cost # type: ignore[attr-defined] (bedrock.py:26; helper at _emit_helpers.py:164). Either promote _emit_cost to public or refactor to inherit from the base.
Concerns
4. A2UI and UCP shipped without upstream protocol references
a2ui.py (110 LOC) and ucp.py (163 LOC) sit alongside A2A, MCP, AG-UI, and AP2 protocol adapters but differ:
- No
pyproject.tomlextras for either (vsa2a-sdk,mcp, etc.). - Zero upstream imports in either file.
- No spec URL in module docstrings.
a2ui— no PyPI package by that name.ucp— aucppackage exists on PyPI but it's an unrelated SMS protocol wrapper ("Python EMI UCP protocol wrapper"). A third-partyuniversal-commerce-protocolv0.0.1 also exists fromupsonic/universal-commerce-protocolon GitHub, but LayerLens'sucp.pydoesn't import or interop with it.
a2ui.py defines commerce.ui.* event vocabulary with method names on_surface_created, on_user_action (lines 36-37). ucp.py defines discover_suppliers, browse_catalog, start_checkout, complete_checkout, issue_refund (lines 37-41).
Both pass ProtocolCertificationSuite.certify() with the same stamp as A2A and MCP, because the suite is a structural conformance checker (verifies issubclass, PROTOCOL_VERSION non-empty, methods callable) — not a real protocol handshake.
If A2UI/UCP are internal LayerLens proposals: prefix with layerlens_ and document them as internal observability schemas. If they're stubs for protocols that don't exist yet: drop until there's an upstream spec.
5. Haystack adapter wired into auto-detection
haystack.py _on_connect mutates _hs_tracing.tracer.actual_tracer = self._tracer globally; _registry.py:88-90 wires HaystackAdapter into the auto-detection list. If Haystack was meant to be a separate product-surface decision, this commits us to it implicitly. Worth confirming with product before merge.
6. Protocol version strings drift from upstream
| Adapter | PROTOCOL_VERSION in PR |
Upstream spec | Upstream Python SDK on PyPI |
|---|---|---|---|
| A2A | "0.3.0" (a2a/adapter.py:38) |
v1.0.0 (released 2026-03-12; v0.3.0 was current 2025-07-30) | a2a-sdk v1.0.3 |
| MCP | "1.0.0" (mcp/adapter.py:43) |
date-format: LATEST_PROTOCOL_VERSION = "2025-11-25" in upstream python-sdk; SUPPORTED list: "2024-11-05", "2025-03-26", "2025-06-18", "2025-11-25" |
mcp v1.27.1 |
| AG-UI | "0.1.0" (agui/adapter.py:27) |
(didn't verify spec version directly) | ag-ui-protocol v0.1.18 |
| AP2 | "0.1.0" (ap2.py:41) |
v0.2.0 (released 2026-04-28) — full name is Agent Payments Protocol, not "Agent Protocol 2" | ap2 v0.1.1 |
Two issues:
- MCP
"1.0.0"is neither a valid protocol-spec version (which is date-formatted) nor the SDK version (semver1.27.x). Either pull frommcp.types.LATEST_PROTOCOL_VERSIONat runtime, or set to a sentinel until negotiation is implemented. - A2A
"0.3.0", AP2"0.1.0"lag current upstream. These show up in trace events and certification output.
Also: AP2 stands for Agent Payments Protocol, not "Agent Protocol 2." If any docstrings/README/marketing reference the latter, they need updating.
Minor / nice-to-have
7. LiteLLM adapter doesn't wire new base hooks
MonkeyPatchProvider defines extract_tool_calls (_base_provider.py:38) and aggregate_stream (:44). litellm.py delegates extract_output and extract_meta to OpenAIProvider but doesn't delegate the two new hooks — so tool calls are dropped and streaming aggregation returns the no-op default. Two-line fix:
extract_tool_calls = staticmethod(OpenAIProvider.extract_tool_calls)
aggregate_stream = staticmethod(OpenAIProvider.aggregate_stream)8. Ollama cost_per_second parameter is unused
OllamaProvider.__init__ accepts cost_per_second: float | None = None (ollama.py:34); stored on the instance at :36. The module docstring (line 4) advertises "an optional cost_per_second lets callers account for compute time" — but the parameter is never referenced in cost computation anywhere in the file. Either remove or apply it via duration when calculating cost.
9. CrewAI memory integration
Delegation/handoff coverage is in place via crewai_event_bus subscriptions (crewai.py lines 156-178) and agent.handoff emission at :523, 550. There's no analogue to _langchain_memory.py for CrewAI's memory store — read/write hooks aren't proxied. Not a blocker, but worth a follow-up if memory-state tracing is a goal for CrewAI parity.
10. Five silent-pass sites in framework adapters
Bare except Exception: pass at:
crewai.py:643agno.py:187llamaindex.py:610pydantic_ai.py:443mcp/tool_wrapper.py:49(this one has comment# pragma: no cover - defensive)
These swallow exceptions silently. The rest of the codebase consistently either logs at debug or attaches an error to the emitted event. A short justifying comment (or logger.debug(...)) would prevent these from being read as defects in future review.
11. Attestation envelope mutability + concurrency
AttestationEnvelope(_envelope.py:16-17) is@dataclasswithoutfrozen=True. Theenvelopesproperty (_chain.py:28) returns a shallow copy via[copy(e) for e in self._chain]— a real defence — butfrozen=Truewould make immutability load-bearing rather than convention-bound._chain.pycontains nothreading.Lockorasyncio.Lock;add_event(:39) is unprotected. Single-writer usage is fine, but a lock guard would make multi-threaded use safer.
12. Evaluation runner swallows scorer exceptions
runner.py:103-105:
```python
except Exception as exc:
log.debug("scorer %s raised on item %s: %s", name, item.id, exc)
item_scores[name] = 0.0
```
A broken scorer becomes indistinguishable from a legitimately failing item. Suggest attaching the exception to EvaluationRunItem.error and surfacing it in the aggregate.
13. Replay store / dataset store default to in-memory
InMemoryReplayStore is the default in ReplayController.__init__ (controller.py:39: self._store: ReplayStore = store or InMemoryReplayStore()). InMemoryDatasetStore is the only implementation shipped. The interfaces are Protocol-based so swap-in is one line — but defaults lose state on restart. Either a docstring warning or a JSONFileStore reference impl would help.
14. Empty PR description
43k LOC merge across 336 files with no PR body. A short changelog grouped by subsystem (instrument / attestation / replay / synthetic / evaluation_runs / cli / docs) would help anyone trying to bisect later.
Strengths
- No fake data.
StochasticProvider(synthetic/providers.py:89) tagsid=f"synth_{uuid.uuid4().hex[:16]}",created_at="synthetic",data["synthetic"]=True(:133-140). - Honest fallbacks.
pricing.calculate_costreturnsNonefor unpriced models;_emit_costpropagates None (callers see explicit None, not fake 0.0).cli/commands/evaluations.py:94raisesclick.UsageError("remote dataset lookup is not yet implemented — pass --dataset-file")instead of returning empty data. - Real cryptographic chain.
_chain.py:43includes_previous_hashin the hashed payload (payload = {**data, "_previous_hash": self._last_hash}), not just adjacent — tampering withprevious_hashbreaks the hash._signing.py:19useshmac_mod.compare_digest(timing-safe). - Defensive extractors in provider adapters —
getattr(..., default)plus try/except around attribute walks. Won't crash on unexpected SDK shapes. - No
AdapterCapabilityenum. Capability-without-implementation can't happen by construction. - Test density. 33,433 test LOC vs 28,174 source LOC. 93
test_*.pyfiles (+ 20 conftest/init = 113 .py total intests/). 1741 test functions, 3694 asserts. Zero empty-test files. - Dependency tiering. Runtime deps are just
httpx+pydantic. All frameworks/providers are optional extras with Python-version gating for 3.10-only packages. - Zero broken imports across 359
from layerlens.*references (verified by enumerating all module paths undersrc/layerlens/and checking every import target resolves).
Verdict
Three blockers (CI, MS Agent Framework naming, Bedrock streaming/tool extraction) + three product-decision items (A2UI/UCP, Haystack, protocol versions). Everything else can land as follow-ups.
…-2879/2881/2883)
Per Marc's TEL-026 / TEL-028 / TEL-029 acceptance criteria, map provider-specific
fields to vendor-namespaced OTel GenAI attributes:
gen_ai.openai.response.system_fingerprint (TEL-026)
gen_ai.openai.response.service_tier (TEL-026)
gen_ai.openai.request.seed (TEL-026)
gen_ai.anthropic.cache_read_input_tokens (TEL-028)
gen_ai.anthropic.cache_creation_input_tokens (TEL-028)
gen_ai.response.finish_reasons (now also from Anthropic stop_reason)
Wire OTel attribute mapping into the bespoke Bedrock emit path that bypasses
the standard MonkeyPatchProvider flow. Add response_id extraction across the
remaining adapters per TEL-029:
- Bedrock: ResponseMetadata.RequestId
- Vertex: response.response_id / response.id
- Per-family stop_reason extraction for Bedrock invoke_model (anthropic,
cohere, amazon, meta, mistral)
22 new tests covering vendor-namespacing edge cases and end-to-end
finish_reasons + response.id coverage across all 7 adapters (OpenAI,
Anthropic, Azure OpenAI, Vertex, Bedrock, Ollama, LiteLLM).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-3327/3330) Per Marc's ADP-071 Claude Code Prompt, wrap pricing in a class with the contract he spelled out: PricingTable.from_default() / .from_dict() / .from_json_file() PricingTable.calculate_cost(model, input_tokens, output_tokens) -> CostRecord PricingTable.has_model() / .models() / .as_dict() Fuzzy resolution: ``gpt-4o-2024-08-06`` -> ``gpt-4o`` (date-suffix strip), ``claude-3-5-sonnet-20990101`` -> ``claude-3-5-sonnet``. Longest-prefix fallback disambiguates ``gpt-4o`` from ``gpt-4`` for unrecognised dated variants. Added base-name entries for the Claude family so fuzzy-stripped lookups resolve. LAYERLENS_PRICING_TABLE env var loads JSON overrides at runtime, satisfying LAY-3327's "pricing updateable without code changes" AC. Override precedence: env > caller-supplied table > bundled PRICING. Bad JSON / unreadable files log a warning and fall back to defaults rather than crashing the request path. CostRecord dataclass carries cost_usd + model + input/output/cached token counts so callers can pipe it directly into the cost.record event payload. 36 new pricing tests covering defaults, fuzzy matching, caller overrides, cached-token discounts (Anthropic 90% / Google 75% / others 50%), env loading, malformed-JSON resilience, and graceful unknown-model handling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… (LAY-3326/3329/3331/3332)
Per Marc's ADP-071 Claude Code Prompt, lift streaming logic into
src/layerlens/instrument/adapters/providers/_streaming.py:
- StreamingResponseWrapper tracks first-chunk arrival + chunk list
- stream_chunks_sync / stream_chunks_async preserve the SDK iterator
contract (downstream consumers see identical chunks) while feeding the
wrapper
- On normal completion: emit consolidated model.invoke with ttft_ms and
streaming_duration_ms in event metadata
- On mid-stream exception: emit agent.error with partial_meta extracted
from accumulated chunks plus partial_chunks count, per LAY-3329/3332 DoD
_base_provider.py now delegates _wrap_stream_iterator and
_wrap_async_stream_iterator to the new module. Same behavioural contract,
one implementation shared by every monkey-patched provider.
emit_llm_events grew ttft_ms / streaming_duration_ms kwargs; emit_llm_error
grew partial_meta / partial_chunks + error_type for richer agent.error
payloads.
OpenAI tool-call JSON parsing now logs a WARNING when arguments are
malformed (LAY-3331 DoD) with the offending snippet truncated for log
hygiene, rather than silently returning the raw string.
27 streaming tests including end-to-end TTFT (sync + async),
iterator-contract preservation, partial_meta on mid-stream error,
malformed-JSON warning, "no tool_calls = no events emitted", and parallel
tool-call fragment assembly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…op (LAY-3328/3332/3333/3334)
Tighten _CAPTURE_PARAMS so raw ``system``, ``messages``, ``tools``,
``tool_choice``, ``metadata``, and ``thinking`` payloads NEVER reach the
event parameters dict. derive_params builds privacy-safe summaries instead,
per LAY-3334 ACs:
has_system: bool, system_length: int (presence + length, NOT content)
messages_count, message_roles (count + role distribution, no content)
tools_count, tool_names (no schemas / descriptions)
tool_choice_type, tool_choice_name (type + name only)
metadata_user_id (only field captured from metadata,
for Anthropic's cost-attribution use)
thinking_budget_tokens, thinking_type (broken out from the thinking config)
extract_meta now surfaces content_block_counts (text / tool_use / thinking),
tool_use_names, and has_thinking on every response per LAY-3334.
Streaming aggregator:
- explicit message_stop handler (Marc's AC literally names it)
- TTFT anchored on first content_block_delta (not message_start, which
fires before any content is generated)
- defensive thinking_tokens read from message_start.usage and
message_delta.usage so we pick up any future SDK signal
- partial_meta emission on mid-stream exception including any cache
tokens already received
11 new tests covering privacy boundaries (system content never leaks,
metadata sibling fields not captured), thinking budget capture, baseline
non-thinking responses unchanged, content-block counts incl. tool_use
names, mid-stream errors, and message_stop receipt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per ADP-070, ``from layerlens.adapters.providers import AzureOpenAIAdapter``
(and the other 6) must succeed and expose ``connect_client(client)`` +
``health_check() -> AdapterHealth``. The canonical implementation lives at
``layerlens.instrument.adapters.providers.*Provider`` with ``.connect()``;
this commit adds a thin shim at the legacy path so the AC bullets are
verifiable without forking the code.
Wrappers cover OpenAI, Anthropic, Azure OpenAI, Vertex, Bedrock, Ollama,
LiteLLM. ``health_check`` returns a self-contained AdapterHealth dataclass
matching the legacy pydantic model's shape; no dependency on any other
adapter module so the shim works on a clean checkout.
12 tests verify:
- Each adapter is importable from the legacy path
- AdapterHealth + AdapterStatus have the expected shape
- Health flips from DISCONNECTED to HEALTHY after connect_client
- connect_client wires up real tracing end-to-end for OpenAI, Anthropic,
Bedrock (boto3-shape mock incl. ResponseMetadata.RequestId), Vertex
(mocked generate_content), and Ollama (mocked chat) — each producing
model.invoke (+ cost.record where the model is priced)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
….3450) - pyproject: add langgraph, crewai, autogen, agentforce extras + all-frameworks omnibus - requires_pydantic="2" markers on LangGraph + CrewAI adapters - PEP 562 lazy public-API exports for the 6 framework adapters in frameworks/__init__.py - Five runnable sample scripts under samples/instrument/ that exit 0 with install hints when the SDK is absent - Five reference docs under docs/adapters/frameworks/ (Agentforce includes Connected App / OAuth setup section) Lint + 80 framework tests + 488 wider instrument tests all green.
….3450) Asserts importing the frameworks package never eagerly pulls langgraph, langchain-core, crewai, autogen, autogen-core, autogen-agentchat, or semantic_kernel. Also covers AttributeError for unknown names, __dir__ advertising all 6 public adapters, and resolving AgentforceAdapter (the only adapter whose dep ships with the default install) without leaking the others. mypy --strict pass over the 5 M2 adapters + the lazy-export __init__ — zero issues.
Adapters - google_vertex: capture GenerativeModel.model_name on connect (strip `models/` prefix) and inject into response meta via overridden _extractors so cost-record events resolve against PRICING. - ollama: bind OLLAMA_HOST endpoint into meta on every invoke; when cost_per_second is set, compute infra_cost_usd from eval_duration + prompt_eval_duration and include in the model.invoke payload. Consumable surface - pyproject: new `providers-vertex` and `providers-ollama` extras (canonical M3 names per AC); existing `google-vertex` and `ollama` kept as aliases. - providers/__init__.py: PEP 562 lazy public API for OpenAI, Anthropic, AzureOpenAI, Bedrock, GoogleVertex, Ollama, LiteLLM. Default install stays lean. - samples: google_vertex/example.py + ollama/example.py, both exit 0 with install / setup hints when SDK or daemon is absent. - docs: google_vertex.md (SA-JSON + ADC sections per AC) and ollama.md (`ollama serve` setup + cost_per_second explanation per AC). Marc-prep - 19 new adapter unit tests (Vertex: SimpleNamespace SDK mocks; Ollama: dict-shape fixtures matching the ollama package). - 4 new lazy-import regression tests for providers/__init__.py. - mypy --strict clean over the 3 edited source files. - ruff check + ruff format clean.
No description provided.