Skip to content

feat(telemetry): activation + reliability signals (schema v8)#181

Merged
nicolotognoni merged 2 commits into
mainfrom
feat/telemetry-activation-signals
Jun 15, 2026
Merged

feat(telemetry): activation + reliability signals (schema v8)#181
nicolotognoni merged 2 commits into
mainfrom
feat/telemetry-activation-signals

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

  • Closes the activation funnel and makes call failures diagnosable in the anonymous usage telemetry — the dataset previously showed that a call happened but not why an install never reached one, nor where latency/errors came from.
  • Adds one event and several dimensions; all additive, opt-out, fire-and-forget (never affects a live call), no PII, Python ↔ TypeScript parity. Telemetry schema v7 → v8.

Implementation

  • config_incomplete event — emitted once when required runtime config is missing, immediately before the existing startup raise (control flow unchanged). Dimension missing ∈ {carrier_credentials, llm_key, engine_config, other} — never the key value or any PII. This is the activation-blocker signal (installs that run but can't proceed).
  • Per-stage latency on call_completed: stt_latency_ms, llm_ttft_ms, tts_first_byte_ms, eou_latency_ms (whole ms, read off the existing CallMetrics breakdown; a dimension is omitted when its stage didn't run, e.g. Realtime).
  • Error taxonomy on call_completed: error_layer ∈ {stt,llm,tts,carrier,config,internal,none,other} + disconnect_reason ∈ {hangup_local,hangup_remote,error,timeout,no_answer,busy,completed,other}, derived from the already-available outcome/error_code (no new call-path plumbing).
  • time_to_first_call_bucket on call_started/call_completed (install-age at call time).
  • Client sampling: PATTER_TELEMETRY_SAMPLE (default 1.0 = no sampling), deterministic per run via the run id hash, never samples first_run/config_incomplete/error events; the chosen sample_rate is stamped so analytics can extrapolate (weight = 1/sample_rate).
  • Files: libraries/{python/getpatter,typescript/src}/telemetry/{events,client,env,install_id,call_metrics}.* + client.*. New dedicated test files per feature (the shared telemetry test files are only touched for the SCHEMA_VERSION bump).
  • The telemetry ingestion endpoint has been updated to accept schema v8, so the new event/fields are not dropped server-side.
  • No version bump in this PR — entries sit under ## Unreleased; the bump + Unreleased → X.Y.Z rename happen at release time per the release workflow.

Breaking change?

No. Every addition is optional with a safe default (PATTER_TELEMETRY_SAMPLE defaults to 1.0), no existing event/dimension changed or removed, and the schema bump is purely additive. Existing callers are unaffected.

Test plan

  • Python: pytest tests/ — telemetry 116 new/updated pass; full suite 2712 pass / 0 fail
  • TypeScript: npm test (telemetry 87 pass; full 2142 pass / 0 fail) + npm run lint (clean) + npm run build (ok)
  • Python ↔ TS parity verified (events, dimension value-sets, SCHEMA_VERSION=8, defaults all identical)
  • E2E smoke — n/a (telemetry-only; no pipeline/handler behaviour change)

Docs updates

  • CHANGELOG.md updated (## Unreleased### Added for the new signals, ### Changed for the schema bump).
  • Follow-up (not in this PR): docs/telemetry.mdx to document PATTER_TELEMETRY_SAMPLE, the config_incomplete event, and the new call_completed dimensions; feature-inventory rows.

Adds anonymous, opt-out telemetry to close the activation funnel and make call failures diagnosable. All additive (schema v7->8), Python<->TS parity, fire-and-forget (never affects a call), no PII.

- config_incomplete event: emitted once when required runtime config is missing (missing=carrier_credentials|llm_key|engine_config), immediately before the startup raise — the activation-blocker signal.
- call_completed per-stage latency: stt_latency_ms, llm_ttft_ms, tts_first_byte_ms, eou_latency_ms (whole ms, omitted when a stage didn't run).
- call_completed error taxonomy: error_layer + disconnect_reason enums.
- time_to_first_call_bucket on call_started/call_completed.
- client sampling: PATTER_TELEMETRY_SAMPLE (default 1.0), deterministic per run, never samples first_run/config_incomplete/error; sample_rate weight stamped for extrapolation.

Tests: telemetry Py 116 / TS 87 in new dedicated files; full suites green; lint + build clean; Py<->TS parity verified.
@nicolotognoni nicolotognoni merged commit 5997808 into main Jun 15, 2026
10 checks passed
@github-actions github-actions Bot deleted the feat/telemetry-activation-signals branch June 16, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant