Align observability with OpenTelemetry at the edges: GenAI semconv naming + post-hoc OTLP export

## Context

Before investing further in the observability dashboard (live viewer metrics + timeline + filtered tail), we asked whether open standards — specifically OpenTelemetry — should shape the work so we don't reinvent wire formats and vocabularies.

**Decision: OTel at the edges, not at the core.** `events.jsonl` stays the canonical, self-contained source of truth; OTel alignment happens in naming and in a derived export channel.

## Current state

The event log was already designed trace-shaped: every event carries `trace_id` (32-hex) and `span_id` (16-hex) in OTel shape, per `session.py`'s module docstring ("Lets downstream tools (Phoenix, Langfuse, Braintrust) ingest events as a trace").

Standard status as of June 2026:
- **GenAI client spans are stable** — `gen_ai.request.model`, `gen_ai.usage.input_tokens` / `output_tokens`, `gen_ai.response.finish_reasons`.
- **Agent spans are still experimental** (the level where Tilth's tasks / iterations / verdicts live).
- Overall GenAI semconv is still marked Development.

So: model-call vocabulary is safe to align with now; agent-level vocabulary is still moving and not worth chasing.

## Why not rebuild the core on OTel

- `events.jsonl` is the product, not plumbing: replay fidelity (live view byte-identical to replay), chat reconstruction including nudges/reasoning, zero-infra single-file inspection. OTLP export is async and lossy by design; replaying from a collector requires infrastructure.
- The interesting semantics (case/verdict, rejection categories, ledger, iteration accounting) have no semconv home — they'd be custom attributes regardless.
- Generic trace UIs won't render the worker↔eval dialogue as a conversation; that layer stays ours either way.
- The OTel SDK is a real dependency tree against a stdlib-first repo — and it isn't needed: OTLP has a stable JSON encoding over HTTP, so a post-hoc converter is stdlib-only (`urllib`).

## Plan

1. **Naming alignment, opportunistically.** When a payload schema is next touched, prefer semconv names (`input_tokens`/`output_tokens` over `prompt_tokens`/`eval_tokens`). Until then, maintain a documented mapping table (events.jsonl field → `gen_ai.*` attribute). No big-bang rename — it would ripple through `summary.py`, the viewer, and `SUMMARY_VERSION` for zero user-visible gain.
2. **`tilth export-otel <session_id>`** — a derived channel (same principle as `summary.json`: derived, never a second store) that converts a finished session's `events.jsonl` to OTLP/JSON and POSTs it to a collector endpoint. No SDK, no change to the loop. Validate against a local Jaeger all-in-one.
3. **Settle the trace hierarchy first.** Today `trace_id` is per-*task*, so a session would export as N disconnected traces. OTel wants the session as the trace root with tasks as child spans. Decide deliberately (session-level trace id + parent links, or task traces with a session resource attribute) before the exporter lands.

## Non-goals

- OTel SDK inside the harness loop.
- Replacing `events.jsonl` or the built-in viewer with a collector/backend.
- Chasing the experimental agent-span conventions while they churn.

## References

- https://opentelemetry.io/docs/specs/semconv/gen-ai/
- https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
- https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- https://opentelemetry.io/blog/2026/genai-observability/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align observability with OpenTelemetry at the edges: GenAI semconv naming + post-hoc OTLP export #40

Context

Current state

Why not rebuild the core on OTel

Plan

Non-goals

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Align observability with OpenTelemetry at the edges: GenAI semconv naming + post-hoc OTLP export #40

Description

Context

Current state

Why not rebuild the core on OTel

Plan

Non-goals

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions