feat | instrumentation engine, attestation, and adapter integrations by garrettallen14 · Pull Request #88 · LayerLens/stratix-python

garrettallen14 · 2026-04-13T21:08:04Z

No description provided.

…nt traces

…tion

…ope immutability

…(L1-L6) (#79)

…ing (#83) * feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai

* feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai * feat: new adapters

feat | agentforce, agno, autogen, bedrock adapters

… into development

…ock)

Brings in the SDK samples overhaul (70+ samples), the auto-release workflow, CHANGELOG, the custom-model update/delete API, and the restored test files we'd lost on this branch (test_samples.py, test_samples_e2e.py, test_mcp_server.py, etc.). pyproject.toml conflicted in the tests/** per-file-ignores section -- I kept main's broader set (T201, T203, ARG, B007) so the restored test_samples_e2e.py keeps linting, and dropped my orphan examples/instrument_{openai,langchain}.py files in favour of main's renamed samples/integrations/{openai,langchain}_instrumented.py. tests/conftest.py auto-merged main's "live" pytest marker.

Mirrors the _FRAMEWORK_PACKAGES pattern from ateam without dragging in the singleton machinery. discover_installed() uses importlib.util find_spec so detection is cheap and has no import side effects; auto(client) instantiates and connects whichever frameworks are importable in the current env. Providers stay explicit since they need the user's SDK client, which we don't have at auto() time. Same for agentforce and langfuse, which need credentials at connect. Both helpers are re-exported from layerlens.instrument. Drift-guard tests pin the three lookup tables to stay consistent.

After every node exit we hash the output state and emit the digest as agent.state.change, so the dashboard can diff state across nodes without needing the raw payloads. Uses the same compute_hash as the attestation chain so the format matches. Constructor knobs: - emit_state_hash=False to turn it off entirely - state_include_keys / state_exclude_keys to scope the hash to a subset of the state dict Non-serialisable state falls back to a repr-based hash so we still emit something stable. agent.state.change is in _ALWAYS_ENABLED, so no layer gating needed.

Whenever the active langgraph_node transitions between distinct named agents we emit agent.handoff. That puts LangGraph in line with the OpenAI Agents and Google ADK adapters, which already detect handoffs natively. HandoffDetector is intentionally framework-agnostic -- I'll reuse it for the CrewAI delegation work next. Same-node revisits and the first node observed don't emit, so the noise stays low. Context gets scrubbed through the same allow-list ateam uses (task, messages, objective, etc.) with long strings truncated and long lists collapsed to placeholders, then hashed so dashboards can correlate handoffs without seeing the raw state.

Two related pieces. inject_headers / extract_headers let user code stitch our traces into a wider distributed-tracing system. If OpenTelemetry is installed we delegate to its propagator; otherwise we build traceparent by hand from the active TraceCollector and current span. Our 16-hex trace ids get zero-padded to 32 hex on the wire and shortened back on extract. gen_ai_attributes() returns a dict of OTel GenAI semconv attributes (gen_ai.system, gen_ai.operation.name, request params, response model / id / finish_reasons, usage tokens). The provider emit helper now embeds this under otel_gen_ai on every model.invoke, so OTel-aware tooling can read the standard names without having to re-map our internal field names.

Hierarchical crews delegate work through the built-in "Delegate work to coworker" and "Ask question to coworker" tools, but older crewai versions don't fire AgentDelegationStartedEvent for them. That left the handoff invisible in our traces. The tool-call path now matches those tool names case-insensitively and synthesises agent.handoff with from_agent (current agent role), to_agent (coworker arg), tool_name, a sequence number, and a sha256 hash over the scrubbed task+context. The typed-event handler bumps the same sequence so newer crewai versions emit identical payloads. tool_args is parsed robustly -- crewai sometimes passes it as a dict, sometimes as a JSON string. Context scrubbing reuses _handoff.scrub_context for parity with the LangGraph handoff format.

Wraps semantic-kernel's AgentChat / AgentGroupChat invoke (an async generator that yields ChatMessageContent) and processes each yielded message for: - tool calls / results from FunctionCall items - model.invoke + cost.record derived from message.metadata - agent.handoff on agent_name turn transitions, via the shared HandoffDetector A one-shot environment.config event fires per chat instance on its first invocation, capturing the chat type, agents, plugins, and selection / termination strategy class names. Provider detection covers the usual suspects (gpt/o1/o3 -> openai, claude -> anthropic, gemini -> google, etc.) and falls back to azure_openai, since that's what MS Agent Framework fronts most of the time. Registered in the auto-detection tables, so layerlens.instrument.auto() picks it up when semantic-kernel is installed. Coexists fine with the existing SemanticKernelAdapter -- they instrument different surfaces (filters vs AgentChat wrapping).

EmbeddingAdapter wraps OpenAI / Cohere / sentence-transformers and emits embedding.create with provider, model, batch size, vector dimensions, token usage, and latency. Pass-through when no collector is active so it adds no overhead outside a trace. VectorStoreAdapter does the same for Pinecone, Chroma, and Weaviate (near_vector / near_text). retrieval.query events carry the query shape, result count, and a min/max/mean over scores or distances. BenchmarkImporter lives under layerlens.benchmarks rather than the adapters tree -- it's a data-conversion utility, not an instrumentation tracer (ateam's own docstring flagged the naming inconsistency in their version). Reads HuggingFace Datasets, HELM result JSON, CSV, JSON arrays, and JSONL. Optional schema_mapping renames source fields to layerlens canonical names.

TracedMemory is a transparent proxy around any LangChain memory object. save_context and clear are intercepted; before the call we hash the memory's loaded variables, after the call we hash again, and if the hash changes we emit agent.state.change. Everything else passes through. For workflows where save_context happens outside our control (e.g. inside a third-party agent), MemoryMutationTracker is a context manager that frames a logical operation and emits one event per logical operation rather than one per save_context call. Hashing uses the same compute_hash as the attestation chain, so before/after digests are comparable across the LangChain and LangGraph adapters. Non-serialisable memory contents fall back to a repr-based hash so we still get a stable identifier. Exported as wrap_memory / TracedMemory / MemoryMutationTracker from layerlens.instrument.adapters.frameworks.langchain.

The existing layerlens.replay subpackage already drives full replay via ReplayController; this fills in the missing persistence piece so a TraceCollector can round-trip through disk. TraceCollector.to_replay_dict() returns the same payload that flush() uploads (trace_id, events, capture_config, attestation), but without sealing the hash chain -- the collector stays usable for further emits. _build_trace_payload now takes a seal flag; flush() still seals, to_replay_dict doesn't. New layerlens.replay.snapshot module: - dump / dump_collector / load_snapshot for the file-IO side - replay_events to re-emit captured events into a fresh collector - serialize_adapter mirrors the per-adapter serialize_for_replay pattern from ateam, bundling AdapterInfo + current trace into one dict

GA check for protocol adapter classes. Verifies the class extends BaseProtocolAdapter, sets non-empty PROTOCOL and PROTOCOL_VERSION, implements connect / disconnect / adapter_info, returns the right types from adapter_info() (AdapterInfo with adapter_type="protocol") and probe_health() (ProtocolHealth), and that negotiate_version picks an exact match when offered. Result types are JSON-serialisable dataclasses; failures are partitioned by severity, so "couldn't instantiate the class to check runtime shape" surfaces as a warning while contract violations are errors. Runs against the three shipped shim adapters (a2ui, ap2, ucp) in a parametrised test, so a regression in any of them surfaces on the next run. Also has to defensively ensure an asyncio event loop exists before instantiation -- BaseProtocolAdapter creates an asyncio.Semaphore in __init__ and the suite would otherwise break when run after asyncio-heavy tests that closed their loop.

Two coupled changes that have to land together because either alone breaks rye run lint. ~265 files reformatted by ruff-format. These accumulated as the pre-commit hook's format pass touched the broader repo after main's 70+ new samples and the restored test files came in. Format-only -- no semantic edits. The formatter also wrapped three suppressions onto the wrong line, which I had to put back: - probe_health's # noqa: ARG002 ended up on the return-type line instead of the line declaring the unused `endpoint` arg. - langchain_core's BaseCallbackHandler import got wrapped onto multiple lines, leaving the # pyright: ignore on the closing paren where pyright doesn't honour it. Pinned to a one-liner inside # fmt: off/on. - ProtocolCertificationSuite._safe_instantiate took a parameter named `cls`, which pyright reserves for classmethods. Renamed to target_cls. Also added .venv* to .gitignore so locally-created Python alt envs don't show up in status.

Newer crewai inspects each handler's parameter count and passes a third `state` positional when there are 3 params. We were using `def _handler(source, event, _m=method)` — i.e. a default-arg closure to capture the bound method — which crewai then clobbered by passing state as the third arg, leaving _m = state (not callable) and the handler raising 'NoneType' object is not callable. Switched to a factory closure (`_make_handler(target)`) so the visible signature is exactly (source, event) — crewai takes the 2-arg path and the bound method is captured in the closure properly. Surfaced by the new tests/e2e CrewAI delegation tests under Python 3.11 with crewai 1.14, where the real event bus dispatches handlers through a ThreadPoolExecutor. The existing unit tests didn't catch it because they invoke the adapter's _on_* methods directly rather than going through the event bus.

stepdi

Summary

Substantial implementation across instrumentation, attestation, replay, and adapter integrations (~43k LOC, 30 commits). Cross-cutting code quality is unusually clean — zero hallucinated imports across 359 from layerlens.* references (verified — every target module exists in the repo), zero TODO/FIXME/HACK/XXX markers in src/, 1.19× test-to-source LOC ratio (33,433 / 28,174), all framework/provider/protocol extras properly tiered in pyproject.toml. The _base_provider.py / _base_framework.py hierarchies are well-designed and reduce per-adapter duplication.

Before merge I'd want six items addressed below — three blockers, three scope/naming decisions.

Blockers

1. CI red on Python 3.10 / 3.11 / 3.12 (3.9 passes)

Four distinct failures, all targeted:

test_openai_agents.py — 17+ failures with TypeError: SpanImpl.__init__() got an unexpected keyword argument 'tracing_api_key'. Upstream openai-agents SDK API change. Pin a compatible version or update the test fixture.
test_e2e_crewai_delegation.py::test_chain_of_delegations_keeps_sequence — assert len(handoffs) == 3 → 2. Off-by-one in chain-delegation detection.
SemanticKernel plugin detection — assert "MathPlugin" in plugin_names → 'MathPlugin' in set(). Likely version-pin sensitive.
Test isolation — assert _current_collector.get() is None → <TraceCollector object ...>. ContextVar not cleaned between tests.

2. `ms_agent_framework.py` imports `semantic_kernel`, not `agent-framework`

ms_agent_framework.py:33 does import semantic_kernel.
:53 sets package = "semantic-kernel".
Class docstring (:41): "Microsoft Agent Framework (semantic-kernel agents)."
_registry.py:35-38 acknowledges: "MS Agent Framework ships as part of semantic-kernel; we share the detection key. Both adapters can coexist — they instrument different surface areas (filters vs AgentChat wrapping)."

Two different PyPI packages exist:

semantic-kernel — already instrumented separately by semantic_kernel.py.
agent-framework — v1.4.0 on PyPI ("Microsoft Agent Framework for building AI Agents with Python"). Not currently instrumented.

Two options: rename module/class to semantic_kernel_agents to honestly describe what it instruments, or replace semantic_kernel imports with agent_framework.

3. Bedrock streaming + tool-call extraction incomplete

bedrock.py declares streaming support by wrapping methods (:63-70) but emits a placeholder event extra={"streaming": True, "method": method} (:187) and never aggregates chunks. The module docstring (lines 7-9) is explicit about the StreamingBody single-read constraint, but the practical effect: customers running Bedrock streaming get traces with no content, no usage, no cost.

Separately, _extract_invoke_output (:241-263) and _extract_converse_output (:284-292) both filter to text-only blocks ("text" in block), dropping tool_use blocks entirely. Direct Anthropic adapter handles tool_use (anthropic.py:48, 51, 112, 116, 285, 312); the parsing could be lifted for Bedrock-Anthropic invoke and Converse toolUse content blocks.

Also: bedrock.py:41 inherits from BaseAdapter (not MonkeyPatchProvider like other providers), re-implements emission, and reaches a private helper via from ._emit_helpers import _emit_cost # type: ignore[attr-defined] (bedrock.py:26; helper at _emit_helpers.py:164). Either promote _emit_cost to public or refactor to inherit from the base.

Concerns

4. A2UI and UCP shipped without upstream protocol references

a2ui.py (110 LOC) and ucp.py (163 LOC) sit alongside A2A, MCP, AG-UI, and AP2 protocol adapters but differ:

No pyproject.toml extras for either (vs a2a-sdk, mcp, etc.).
Zero upstream imports in either file.
No spec URL in module docstrings.
a2ui — no PyPI package by that name.
ucp — a ucp package exists on PyPI but it's an unrelated SMS protocol wrapper ("Python EMI UCP protocol wrapper"). A third-party universal-commerce-protocol v0.0.1 also exists from upsonic/universal-commerce-protocol on GitHub, but LayerLens's ucp.py doesn't import or interop with it.

a2ui.py defines commerce.ui.* event vocabulary with method names on_surface_created, on_user_action (lines 36-37). ucp.py defines discover_suppliers, browse_catalog, start_checkout, complete_checkout, issue_refund (lines 37-41).

Both pass ProtocolCertificationSuite.certify() with the same stamp as A2A and MCP, because the suite is a structural conformance checker (verifies issubclass, PROTOCOL_VERSION non-empty, methods callable) — not a real protocol handshake.

If A2UI/UCP are internal LayerLens proposals: prefix with layerlens_ and document them as internal observability schemas. If they're stubs for protocols that don't exist yet: drop until there's an upstream spec.

5. Haystack adapter wired into auto-detection

haystack.py _on_connect mutates _hs_tracing.tracer.actual_tracer = self._tracer globally; _registry.py:88-90 wires HaystackAdapter into the auto-detection list. If Haystack was meant to be a separate product-surface decision, this commits us to it implicitly. Worth confirming with product before merge.

6. Protocol version strings drift from upstream

Adapter	`PROTOCOL_VERSION` in PR	Upstream spec	Upstream Python SDK on PyPI
A2A	`"0.3.0"` (`a2a/adapter.py:38`)	v1.0.0 (released 2026-03-12; v0.3.0 was current 2025-07-30)	`a2a-sdk` v1.0.3
MCP	`"1.0.0"` (`mcp/adapter.py:43`)	date-format: `LATEST_PROTOCOL_VERSION = "2025-11-25"` in upstream `python-sdk`; SUPPORTED list: `"2024-11-05", "2025-03-26", "2025-06-18", "2025-11-25"`	`mcp` v1.27.1
AG-UI	`"0.1.0"` (`agui/adapter.py:27`)	(didn't verify spec version directly)	`ag-ui-protocol` v0.1.18
AP2	`"0.1.0"` (`ap2.py:41`)	v0.2.0 (released 2026-04-28) — full name is Agent Payments Protocol, not "Agent Protocol 2"	`ap2` v0.1.1

Two issues:

MCP "1.0.0" is neither a valid protocol-spec version (which is date-formatted) nor the SDK version (semver 1.27.x). Either pull from mcp.types.LATEST_PROTOCOL_VERSION at runtime, or set to a sentinel until negotiation is implemented.
A2A "0.3.0", AP2 "0.1.0" lag current upstream. These show up in trace events and certification output.

Also: AP2 stands for Agent Payments Protocol, not "Agent Protocol 2." If any docstrings/README/marketing reference the latter, they need updating.

Minor / nice-to-have

7. LiteLLM adapter doesn't wire new base hooks

MonkeyPatchProvider defines extract_tool_calls (_base_provider.py:38) and aggregate_stream (:44). litellm.py delegates extract_output and extract_meta to OpenAIProvider but doesn't delegate the two new hooks — so tool calls are dropped and streaming aggregation returns the no-op default. Two-line fix:

extract_tool_calls = staticmethod(OpenAIProvider.extract_tool_calls)
aggregate_stream  = staticmethod(OpenAIProvider.aggregate_stream)

8. Ollama `cost_per_second` parameter is unused

OllamaProvider.__init__ accepts cost_per_second: float | None = None (ollama.py:34); stored on the instance at :36. The module docstring (line 4) advertises "an optional cost_per_second lets callers account for compute time" — but the parameter is never referenced in cost computation anywhere in the file. Either remove or apply it via duration when calculating cost.

9. CrewAI memory integration

Delegation/handoff coverage is in place via crewai_event_bus subscriptions (crewai.py lines 156-178) and agent.handoff emission at :523, 550. There's no analogue to _langchain_memory.py for CrewAI's memory store — read/write hooks aren't proxied. Not a blocker, but worth a follow-up if memory-state tracing is a goal for CrewAI parity.

10. Five silent-pass sites in framework adapters

Bare except Exception: pass at:

crewai.py:643
agno.py:187
llamaindex.py:610
pydantic_ai.py:443
mcp/tool_wrapper.py:49 (this one has comment # pragma: no cover - defensive)

These swallow exceptions silently. The rest of the codebase consistently either logs at debug or attaches an error to the emitted event. A short justifying comment (or logger.debug(...)) would prevent these from being read as defects in future review.

11. Attestation envelope mutability + concurrency

AttestationEnvelope (_envelope.py:16-17) is @dataclass without frozen=True. The envelopes property (_chain.py:28) returns a shallow copy via [copy(e) for e in self._chain] — a real defence — but frozen=True would make immutability load-bearing rather than convention-bound.
_chain.py contains no threading.Lock or asyncio.Lock; add_event (:39) is unprotected. Single-writer usage is fine, but a lock guard would make multi-threaded use safer.

12. Evaluation runner swallows scorer exceptions

runner.py:103-105:
```python
except Exception as exc:
log.debug("scorer %s raised on item %s: %s", name, item.id, exc)
item_scores[name] = 0.0
```

A broken scorer becomes indistinguishable from a legitimately failing item. Suggest attaching the exception to EvaluationRunItem.error and surfacing it in the aggregate.

13. Replay store / dataset store default to in-memory

InMemoryReplayStore is the default in ReplayController.__init__ (controller.py:39: self._store: ReplayStore = store or InMemoryReplayStore()). InMemoryDatasetStore is the only implementation shipped. The interfaces are Protocol-based so swap-in is one line — but defaults lose state on restart. Either a docstring warning or a JSONFileStore reference impl would help.

14. Empty PR description

43k LOC merge across 336 files with no PR body. A short changelog grouped by subsystem (instrument / attestation / replay / synthetic / evaluation_runs / cli / docs) would help anyone trying to bisect later.

Strengths

No fake data. StochasticProvider (synthetic/providers.py:89) tags id=f"synth_{uuid.uuid4().hex[:16]}", created_at="synthetic", data["synthetic"]=True (:133-140).
Honest fallbacks. pricing.calculate_cost returns None for unpriced models; _emit_cost propagates None (callers see explicit None, not fake 0.0). cli/commands/evaluations.py:94 raises click.UsageError("remote dataset lookup is not yet implemented — pass --dataset-file") instead of returning empty data.
Real cryptographic chain. _chain.py:43 includes _previous_hash in the hashed payload (payload = {**data, "_previous_hash": self._last_hash}), not just adjacent — tampering with previous_hash breaks the hash. _signing.py:19 uses hmac_mod.compare_digest (timing-safe).
Defensive extractors in provider adapters — getattr(..., default) plus try/except around attribute walks. Won't crash on unexpected SDK shapes.
No AdapterCapability enum. Capability-without-implementation can't happen by construction.
Test density. 33,433 test LOC vs 28,174 source LOC. 93 test_*.py files (+ 20 conftest/init = 113 .py total in tests/). 1741 test functions, 3694 asserts. Zero empty-test files.
Dependency tiering. Runtime deps are just httpx + pydantic. All frameworks/providers are optional extras with Python-version gating for 3.10-only packages.
Zero broken imports across 359 from layerlens.* references (verified by enumerating all module paths under src/layerlens/ and checking every import target resolves).

Verdict

Three blockers (CI, MS Agent Framework naming, Bedrock streaming/tool extraction) + three product-decision items (A2UI/UCP, Haystack, protocol versions). Everything else can land as follow-ups.

…-2879/2881/2883) Per Marc's TEL-026 / TEL-028 / TEL-029 acceptance criteria, map provider-specific fields to vendor-namespaced OTel GenAI attributes: gen_ai.openai.response.system_fingerprint (TEL-026) gen_ai.openai.response.service_tier (TEL-026) gen_ai.openai.request.seed (TEL-026) gen_ai.anthropic.cache_read_input_tokens (TEL-028) gen_ai.anthropic.cache_creation_input_tokens (TEL-028) gen_ai.response.finish_reasons (now also from Anthropic stop_reason) Wire OTel attribute mapping into the bespoke Bedrock emit path that bypasses the standard MonkeyPatchProvider flow. Add response_id extraction across the remaining adapters per TEL-029: - Bedrock: ResponseMetadata.RequestId - Vertex: response.response_id / response.id - Per-family stop_reason extraction for Bedrock invoke_model (anthropic, cohere, amazon, meta, mistral) 22 new tests covering vendor-namespacing edge cases and end-to-end finish_reasons + response.id coverage across all 7 adapters (OpenAI, Anthropic, Azure OpenAI, Vertex, Bedrock, Ollama, LiteLLM). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-3327/3330) Per Marc's ADP-071 Claude Code Prompt, wrap pricing in a class with the contract he spelled out: PricingTable.from_default() / .from_dict() / .from_json_file() PricingTable.calculate_cost(model, input_tokens, output_tokens) -> CostRecord PricingTable.has_model() / .models() / .as_dict() Fuzzy resolution: ``gpt-4o-2024-08-06`` -> ``gpt-4o`` (date-suffix strip), ``claude-3-5-sonnet-20990101`` -> ``claude-3-5-sonnet``. Longest-prefix fallback disambiguates ``gpt-4o`` from ``gpt-4`` for unrecognised dated variants. Added base-name entries for the Claude family so fuzzy-stripped lookups resolve. LAYERLENS_PRICING_TABLE env var loads JSON overrides at runtime, satisfying LAY-3327's "pricing updateable without code changes" AC. Override precedence: env > caller-supplied table > bundled PRICING. Bad JSON / unreadable files log a warning and fall back to defaults rather than crashing the request path. CostRecord dataclass carries cost_usd + model + input/output/cached token counts so callers can pipe it directly into the cost.record event payload. 36 new pricing tests covering defaults, fuzzy matching, caller overrides, cached-token discounts (Anthropic 90% / Google 75% / others 50%), env loading, malformed-JSON resilience, and graceful unknown-model handling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… (LAY-3326/3329/3331/3332) Per Marc's ADP-071 Claude Code Prompt, lift streaming logic into src/layerlens/instrument/adapters/providers/_streaming.py: - StreamingResponseWrapper tracks first-chunk arrival + chunk list - stream_chunks_sync / stream_chunks_async preserve the SDK iterator contract (downstream consumers see identical chunks) while feeding the wrapper - On normal completion: emit consolidated model.invoke with ttft_ms and streaming_duration_ms in event metadata - On mid-stream exception: emit agent.error with partial_meta extracted from accumulated chunks plus partial_chunks count, per LAY-3329/3332 DoD _base_provider.py now delegates _wrap_stream_iterator and _wrap_async_stream_iterator to the new module. Same behavioural contract, one implementation shared by every monkey-patched provider. emit_llm_events grew ttft_ms / streaming_duration_ms kwargs; emit_llm_error grew partial_meta / partial_chunks + error_type for richer agent.error payloads. OpenAI tool-call JSON parsing now logs a WARNING when arguments are malformed (LAY-3331 DoD) with the offending snippet truncated for log hygiene, rather than silently returning the raw string. 27 streaming tests including end-to-end TTFT (sync + async), iterator-contract preservation, partial_meta on mid-stream error, malformed-JSON warning, "no tool_calls = no events emitted", and parallel tool-call fragment assembly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…op (LAY-3328/3332/3333/3334) Tighten _CAPTURE_PARAMS so raw ``system``, ``messages``, ``tools``, ``tool_choice``, ``metadata``, and ``thinking`` payloads NEVER reach the event parameters dict. derive_params builds privacy-safe summaries instead, per LAY-3334 ACs: has_system: bool, system_length: int (presence + length, NOT content) messages_count, message_roles (count + role distribution, no content) tools_count, tool_names (no schemas / descriptions) tool_choice_type, tool_choice_name (type + name only) metadata_user_id (only field captured from metadata, for Anthropic's cost-attribution use) thinking_budget_tokens, thinking_type (broken out from the thinking config) extract_meta now surfaces content_block_counts (text / tool_use / thinking), tool_use_names, and has_thinking on every response per LAY-3334. Streaming aggregator: - explicit message_stop handler (Marc's AC literally names it) - TTFT anchored on first content_block_delta (not message_start, which fires before any content is generated) - defensive thinking_tokens read from message_start.usage and message_delta.usage so we pick up any future SDK signal - partial_meta emission on mid-stream exception including any cache tokens already received 11 new tests covering privacy boundaries (system content never leaks, metadata sibling fields not captured), thinking budget capture, baseline non-thinking responses unchanged, content-block counts incl. tool_use names, mid-stream errors, and message_stop receipt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per ADP-070, ``from layerlens.adapters.providers import AzureOpenAIAdapter`` (and the other 6) must succeed and expose ``connect_client(client)`` + ``health_check() -> AdapterHealth``. The canonical implementation lives at ``layerlens.instrument.adapters.providers.*Provider`` with ``.connect()``; this commit adds a thin shim at the legacy path so the AC bullets are verifiable without forking the code. Wrappers cover OpenAI, Anthropic, Azure OpenAI, Vertex, Bedrock, Ollama, LiteLLM. ``health_check`` returns a self-contained AdapterHealth dataclass matching the legacy pydantic model's shape; no dependency on any other adapter module so the shim works on a clean checkout. 12 tests verify: - Each adapter is importable from the legacy path - AdapterHealth + AdapterStatus have the expected shape - Health flips from DISCONNECTED to HEALTHY after connect_client - connect_client wires up real tracing end-to-end for OpenAI, Anthropic, Bedrock (boto3-shape mock incl. ResponseMetadata.RequestId), Vertex (mocked generate_content), and Ollama (mocked chat) — each producing model.invoke (+ cost.record where the model is priced) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….3450) - pyproject: add langgraph, crewai, autogen, agentforce extras + all-frameworks omnibus - requires_pydantic="2" markers on LangGraph + CrewAI adapters - PEP 562 lazy public-API exports for the 6 framework adapters in frameworks/__init__.py - Five runnable sample scripts under samples/instrument/ that exit 0 with install hints when the SDK is absent - Five reference docs under docs/adapters/frameworks/ (Agentforce includes Connected App / OAuth setup section) Lint + 80 framework tests + 488 wider instrument tests all green.

….3450) Asserts importing the frameworks package never eagerly pulls langgraph, langchain-core, crewai, autogen, autogen-core, autogen-agentchat, or semantic_kernel. Also covers AttributeError for unknown names, __dir__ advertising all 6 public adapters, and resolving AgentforceAdapter (the only adapter whose dep ships with the default install) without leaking the others. mypy --strict pass over the 5 M2 adapters + the lazy-export __init__ — zero issues.

Adapters - google_vertex: capture GenerativeModel.model_name on connect (strip `models/` prefix) and inject into response meta via overridden _extractors so cost-record events resolve against PRICING. - ollama: bind OLLAMA_HOST endpoint into meta on every invoke; when cost_per_second is set, compute infra_cost_usd from eval_duration + prompt_eval_duration and include in the model.invoke payload. Consumable surface - pyproject: new `providers-vertex` and `providers-ollama` extras (canonical M3 names per AC); existing `google-vertex` and `ollama` kept as aliases. - providers/__init__.py: PEP 562 lazy public API for OpenAI, Anthropic, AzureOpenAI, Bedrock, GoogleVertex, Ollama, LiteLLM. Default install stays lean. - samples: google_vertex/example.py + ollama/example.py, both exit 0 with install / setup hints when SDK or daemon is absent. - docs: google_vertex.md (SA-JSON + ADC sections per AC) and ollama.md (`ollama serve` setup + cost_per_second explanation per AC). Marc-prep - 19 new adapter unit tests (Vertex: SimpleNamespace SDK mocks; Ollama: dict-shape fixtures matching the ollama package). - 4 new lazy-import regression tests for providers/__init__.py. - mypy --strict clean over the 3 edited source files. - ruff check + ruff format clean.

garrettallen14 and others added 17 commits March 24, 2026 17:43

Add layerlens.attestation: cryptographic hash chains for tamper-evide…

a984465

…nt traces

feat: signing keys

abf4151

refactor: remove client-side signing, delegate to server-side attesta…

4c44731

…tion

fix: attestation chain integrity: error propagation, async I/O, envel…

a6d9bbf

…ope immutability

feat: add BaseAdapter ABC, AdapterRegistry, and refactor all adapters (…

4c5d860

…#77)

feat: replace span trees with flat event emission, add CaptureConfig …

904067a

…(L1-L6) (#79)

feat: cleanup + refactor instrumentation test package (#80)

810671e

feat | unified context model, per-client uploads, and pre-ship harden…

6c5817b

…ing (#83) * feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai

feat | new adapters, 3rd iteration (#84)

07925bc

* feat: context propagation and upload circuit breaker * feat: updates + new adapters * feat: unify context model, per-client uploads, and adapter hardening * fix: update crewai * feat: new adapters

feat: agentforce, agno, autogen, bedrock adapters

91a92b5

Merge remote-tracking branch 'origin/main' into development

260c5c4

fix: formatting, lint, and restore files deleted by merge

c56dcf6

Merge pull request #87 from LayerLens/feat/new-adapters-4

758e82b

feat | agentforce, agno, autogen, bedrock adapters

Merge branch 'development' of https://github.com/LayerLens/atlas-python…

05331c3

… into development

fix: format new adapters from PR #87 (agentforce, agno, autogen, bedr…

eaae65b

…ock)

fix: relax pyright for adapter frameworks/providers (optional deps)

8c3ee42

New adapters, protocols and vscode extension

386e0c5

m-peko force-pushed the development branch from e8d602f to 386e0c5 Compare April 20, 2026 13:59

m-peko added 12 commits May 18, 2026 15:06

m-peko changed the title ~~feat | instrumentation engine, attestation, and 16 adapter integrations~~ feat | instrumentation engine, attestation, and adapter integrations May 19, 2026

stepdi reviewed May 19, 2026

View reviewed changes

garrettallen14 and others added 8 commits May 20, 2026 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat | instrumentation engine, attestation, and adapter integrations#88

feat | instrumentation engine, attestation, and adapter integrations#88
garrettallen14 wants to merge 38 commits into
mainfrom
development

garrettallen14 commented Apr 13, 2026

Uh oh!

stepdi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

garrettallen14 commented Apr 13, 2026

Uh oh!

stepdi left a comment

Choose a reason for hiding this comment

Summary

Blockers

1. CI red on Python 3.10 / 3.11 / 3.12 (3.9 passes)

2. ms_agent_framework.py imports semantic_kernel, not agent-framework

3. Bedrock streaming + tool-call extraction incomplete

Concerns

4. A2UI and UCP shipped without upstream protocol references

5. Haystack adapter wired into auto-detection

6. Protocol version strings drift from upstream

Minor / nice-to-have

7. LiteLLM adapter doesn't wire new base hooks

8. Ollama cost_per_second parameter is unused

9. CrewAI memory integration

10. Five silent-pass sites in framework adapters

11. Attestation envelope mutability + concurrency

12. Evaluation runner swallows scorer exceptions

13. Replay store / dataset store default to in-memory

14. Empty PR description

Strengths

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2. `ms_agent_framework.py` imports `semantic_kernel`, not `agent-framework`

8. Ollama `cost_per_second` parameter is unused