diff --git a/CLAUDE.md b/CLAUDE.md index 847ee4c..3a892a6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -46,7 +46,7 @@ Recent design context, bugs, and convention rationale: see the bug-audit finding - **`FastAPIConfig.application` uses an `UnsetType` sentinel**: shared in `lite_bootstrap/types.py` as `UnsetType` + `UNSET` (singleton). `FastAPIConfig.__post_init__` checks `isinstance(self.application, UnsetType)` and replaces with a constructed `FastAPI()` via `object.__setattr__` (config stays frozen for user-facing immutability). A one-line comment in `__post_init__` documents the freeze bypass. - **Instrument registry**: `BaseBootstrapper` holds a list of instrument instances; it calls `bootstrap()` on each in order and `teardown()` in reverse during shutdown. - **Idempotent teardown**: `BaseBootstrapper.teardown()` returns immediately if `not self.is_bootstrapped`. Cached runtime state in `LoggingInstrument` and `OpenTelemetryInstrument` is reset inside `try/finally` so a raised shutdown leaves no stale references. -- **Logging ↔ Sentry integration**: `logging_instrument.py` injects structlog context into Sentry events. `sentry_instrument.py` chains `before_send` callbacks via `wrap_before_send_callbacks()`. The `skip_sentry` flag in log context suppresses events; the flag is also stripped from the Sentry context payload (added to `IGNORED_STRUCTLOG_ATTRIBUTES`). +- **Logging ↔ Sentry integration**: `logging_instrument.py` renders structlog lines to JSON; the seam is `StructuredLogPayload.parse` in `logging_factory.py`, which reconstructs a line into `message`/`extra`/`skip_sentry` and owns the meta-key vocabulary (`STRUCTLOG_META_KEYS`, stripped from `extra`). `sentry_instrument.py`'s `enrich_sentry_event_from_structlog_log` (chained after the user's `before_send` via `wrap_before_send_callbacks()`) only maps the parsed payload onto the Sentry event — a truthy `skip_sentry` suppresses the event (checked before the message-presence test), else it lifts `message` and attaches `extra` under `contexts.structlog`. `IGNORED_STRUCTLOG_ATTRIBUTES` stays as a back-compat alias of `STRUCTLOG_META_KEYS`. The value object holds no Sentry-event-shape knowledge; adding a custom top-level meta-processor to the logging chain means extending `STRUCTLOG_META_KEYS` (cross-referenced at `tracer_injection`). - **OTel ↔ Logging integration**: The logging instrument injects span/trace IDs from the active OpenTelemetry context into every log record. - **`OpenTelemetryInstrument` is single-instance per process**: `bootstrap()` calls `opentelemetry.trace.set_tracer_provider(...)`, which the OTel SDK enforces as set-once via `_TRACER_PROVIDER_SET_ONCE.do_once(...)` (subsequent calls log `"Overriding of current TracerProvider is not allowed"` and have no effect). `teardown()` calls `shutdown()` on the provider (flushes batched spans, closes exporters) but cannot reset the process-global pointer. Construct one `OpenTelemetryInstrument` per process; do not bootstrap a second instance. Verified against `opentelemetry/trace/__init__.py:548-556`. @@ -63,7 +63,7 @@ See `[project.optional-dependencies]` in `pyproject.toml` for the full extras ma ## Workflow -Per-feature: brainstorming → spec in `planning/changes/YYYY-MM-DD.NN-/design.md` → writing-plans → plan in `planning/changes/YYYY-MM-DD.NN-/plan.md` → executing-plans / subagent-driven-development → requesting-code-review → finishing-a-development-branch. Each change is a folder bundle; `` is a kebab-case description, not a story ID; `.NN` is a zero-padded intra-day counter that breaks same-date ties so the timeline sorts stably. The implementing PR sets `status: shipped` and fills `pr` / `outcome` in the branch, alongside the code and promotes its conclusions into the affected `architecture/.md` — that hand-edit keeps `architecture/` true and is the only ship-time step; there is no folder move. The change listing is generated — run `just index`. See [`planning/README.md`](planning/README.md) for the conventions and [`planning/_templates/`](planning/_templates/) for copy-and-fill starting points. +Per-feature: brainstorming → spec in `planning/changes/YYYY-MM-DD.NN-/design.md` → writing-plans → plan in `planning/changes/YYYY-MM-DD.NN-/plan.md` → executing-plans / subagent-driven-development → requesting-code-review → finishing-a-development-branch. Each change is a folder bundle; `` is a kebab-case description, not a story ID; `.NN` is a zero-padded intra-day counter that breaks same-date ties so the timeline sorts stably. The implementing PR sets `status: shipped` and fills `pr` / `outcome` in the branch, alongside the code and promotes its conclusions into the affected `architecture/.md` — that hand-edit keeps `architecture/` true and is the only ship-time step; there is no folder move. The change listing is generated — run `just index`. A design decision taken without a code change — especially a candidate rejected with a load-bearing reason — is recorded as `planning/decisions/YYYY-MM-DD-.md` (the `decision.md` template, frontmatter `status: accepted|superseded`), each with a **Revisit trigger** so future reviews don't re-litigate it; listed by `just index`. See [`planning/README.md`](planning/README.md) for the conventions and [`planning/_templates/`](planning/_templates/) for copy-and-fill starting points. **Spec** (`design.md`) captures the *thinking* — why, what the design is, trade-offs, scope. Written before code; rarely revised after merge. **Plan** (`plan.md`) captures the *sequencing* — the ordered checklist an executor walks; references the spec for the "why". **`architecture/`** captures the *invariants* of shipped systems — the living truth, promoted in the implementing PR alongside the code. A plan paragraph that would still read correctly with all task numbers and checkboxes removed is design content and belongs in the spec. diff --git a/architecture/instruments.md b/architecture/instruments.md index 429216c..fd0d6cb 100644 --- a/architecture/instruments.md +++ b/architecture/instruments.md @@ -65,11 +65,19 @@ leaves no stale references. ## Cross-instrument integrations -**Logging ↔ Sentry.** `logging_instrument.py` injects structlog context into -Sentry events. `sentry_instrument.py` chains the user's `before_send` after the -built-in structlog enricher via `wrap_before_send_callbacks()`. A `skip_sentry` -flag in the log context suppresses the event; the flag is also stripped from the -Sentry payload (it is in `IGNORED_STRUCTLOG_ATTRIBUTES`). +**Logging ↔ Sentry.** `logging_instrument.py` renders every structlog line to a +flat JSON object via the shared serializer in `logging_factory.py`. The seam +between the two instruments is `StructuredLogPayload` (also in +`logging_factory.py`): its `parse` classmethod reconstructs that line into +`message` / `extra` / `skip_sentry`, owning the meta-key vocabulary +(`STRUCTLOG_META_KEYS`) and stripping it from `extra` — so neither the parsing +detail nor the key set lives in `sentry_instrument.py`. The Sentry side's +`enrich_sentry_event_from_structlog_log` (chained after the user's `before_send` +via `wrap_before_send_callbacks()`) only maps the parsed payload onto the Sentry +event: a truthy `skip_sentry` suppresses the event (checked before the +message-presence test), otherwise it lifts `message` and attaches `extra` under +`contexts.structlog`. `IGNORED_STRUCTLOG_ATTRIBUTES` remains in +`sentry_instrument.py` as a back-compat alias of `STRUCTLOG_META_KEYS`. **OTel ↔ Logging.** The logging instrument injects span/trace IDs from the active OpenTelemetry context into every log record, so logs and traces correlate. diff --git a/lite_bootstrap/instruments/logging_factory.py b/lite_bootstrap/instruments/logging_factory.py index 6e1d7fb..df93af0 100644 --- a/lite_bootstrap/instruments/logging_factory.py +++ b/lite_bootstrap/instruments/logging_factory.py @@ -35,6 +35,51 @@ def _serialize_log_with_orjson_to_string(value: typing.Any, **kwargs: typing.Any return orjson.dumps(value, **kwargs).decode() +# Meta-keys the producer's structlog processor chain emits at the top level of every +# rendered line (see lite_bootstrap/instruments/logging_instrument.py). They are not +# user-supplied extra and are stripped from StructuredLogPayload.extra. If you add a +# custom top-level meta-processor to that chain, add its key here. +STRUCTLOG_META_KEYS: typing.Final = frozenset( + {"event", "level", "logger", "tracing", "timestamp", "exception", "skip_sentry"} +) + + +@dataclasses.dataclass(frozen=True, slots=True) +class StructuredLogPayload: + """The semantic content of one rendered structlog line. + + A consumer-side interpretation type: ``parse`` turns the raw JSON string a + structlog processor chain emits into its parts. It holds no knowledge of any + downstream event shape (e.g. Sentry's); mapping onto a consumer's event is the + consumer's job. + """ + + message: str | None + extra: dict[str, typing.Any] + skip_sentry: bool + + @classmethod + def parse(cls, formatted: str) -> "StructuredLogPayload | None": + """Interpret one rendered structlog line. + + Returns ``None`` when ``formatted`` is not a structlog JSON object (not a + JSON object string, a decode error, or a non-dict result). + """ + if not formatted.startswith("{"): + return None + try: + loaded = orjson.loads(formatted) + except orjson.JSONDecodeError: + return None + if not isinstance(loaded, dict): # pragma: no cover - JSON starting with "{" is always an object + return None + + skip_sentry = bool(loaded.get("skip_sentry")) + message = loaded.get("event") + extra = {key: value for key, value in loaded.items() if key not in STRUCTLOG_META_KEYS} + return cls(message=message, extra=extra, skip_sentry=skip_sentry) + + if import_checker.is_structlog_installed: import structlog diff --git a/lite_bootstrap/instruments/logging_instrument.py b/lite_bootstrap/instruments/logging_instrument.py index bc7eb4d..799661f 100644 --- a/lite_bootstrap/instruments/logging_instrument.py +++ b/lite_bootstrap/instruments/logging_instrument.py @@ -30,6 +30,10 @@ if import_checker.is_opentelemetry_installed: from opentelemetry import trace + # `tracer_injection` adds the top-level `tracing` meta-key. Any new top-level + # meta-processor added to the chain must register its key in + # `STRUCTLOG_META_KEYS` (logging_factory.py) or it will leak into + # StructuredLogPayload.extra and downstream Sentry context. def tracer_injection(_: "WrappedLogger", __: str, event_dict: "EventDict") -> "EventDict": current_span = trace.get_current_span() if not current_span.is_recording(): diff --git a/lite_bootstrap/instruments/sentry_instrument.py b/lite_bootstrap/instruments/sentry_instrument.py index 2c82555..4aa2d24 100644 --- a/lite_bootstrap/instruments/sentry_instrument.py +++ b/lite_bootstrap/instruments/sentry_instrument.py @@ -1,10 +1,9 @@ import dataclasses import typing -import orjson - from lite_bootstrap import import_checker from lite_bootstrap.instruments.base import BaseConfig, BaseInstrument +from lite_bootstrap.instruments.logging_factory import STRUCTLOG_META_KEYS, StructuredLogPayload if typing.TYPE_CHECKING: @@ -16,9 +15,9 @@ import sentry_sdk -IGNORED_STRUCTLOG_ATTRIBUTES: typing.Final = frozenset( - {"event", "level", "logger", "tracing", "timestamp", "exception", "skip_sentry"} -) +# Back-compat alias: this vocabulary moved to logging_factory and was renamed +# STRUCTLOG_META_KEYS. Preserved here for external importers of the old name. +IGNORED_STRUCTLOG_ATTRIBUTES: typing.Final = STRUCTLOG_META_KEYS @dataclasses.dataclass(kw_only=True, frozen=True) @@ -39,35 +38,25 @@ class SentryConfig(BaseConfig): def enrich_sentry_event_from_structlog_log( event: "sentry_types.Event", _: "sentry_types.Hint" ) -> typing.Optional["sentry_types.Event"]: - if ( + if not ( (logentry := event.get("logentry")) and (formatted_message := logentry.get("formatted")) - and (isinstance(formatted_message, str)) - and formatted_message.startswith("{") - and (isinstance(event.get("contexts"), dict)) + and isinstance(formatted_message, str) + and isinstance(event.get("contexts"), dict) ): - try: - loaded_formatted_log = orjson.loads(formatted_message) - except orjson.JSONDecodeError: - return event - - if not isinstance(loaded_formatted_log, dict): # pragma: no cover - return event - - if loaded_formatted_log.get("skip_sentry"): - return None - - if event_name := loaded_formatted_log.get("event"): - event["logentry"]["formatted"] = event_name # ty: ignore[invalid-assignment] - else: - return event + return event - additional_extra = loaded_formatted_log - for one_attr in IGNORED_STRUCTLOG_ATTRIBUTES: - additional_extra.pop(one_attr, None) - if additional_extra: - event["contexts"]["structlog"] = additional_extra + payload = StructuredLogPayload.parse(formatted_message) + if payload is None: + return event + if payload.skip_sentry: + return None + if not payload.message: + return event + event["logentry"]["formatted"] = payload.message # ty: ignore[invalid-assignment] + if payload.extra: + event["contexts"]["structlog"] = payload.extra return event diff --git a/planning/README.md b/planning/README.md index fa80468..075ab36 100644 --- a/planning/README.md +++ b/planning/README.md @@ -53,6 +53,9 @@ into `design.md` + `plan.md`. - **`design.md`** — the spec: the *thinking* (why, design, trade-offs, scope). - **`plan.md`** — the plan: the *sequencing* (the executor's task checklist). - **`change.md`** — both, condensed, for the lightweight lane. +- **`decisions/-.md`** — one file per design decision taken + (especially options *rejected*), each with a revisit trigger, so reviews don't + re-litigate them; listed by `just index`. - **`releases/.md`** — per-release user-facing notes. - **`audits/-.md`** — findings from a code/docs/bug-hunt sweep; spawns fix changes. @@ -65,21 +68,26 @@ Templates live in [`_templates/`](_templates/). `design.md` / `change.md`: `status` (draft|approved|shipped|superseded), `date`, `slug`, `summary` (single line), `supersedes`, `superseded_by`, `pr`, -`outcome`. `plan.md`: `status`, `date`, `slug`, `spec`, `pr`. Files in +`outcome`. `plan.md`: `status`, `date`, `slug`, `spec`, `pr`. +`decisions/*.md`: `status` (accepted|superseded), `date`, `slug`, `summary`, +`supersedes`, `superseded_by`, `pr`. Files in `architecture/` carry **no** frontmatter — living prose, dated by git. ## Index -The change listing is **generated**, not maintained — run `just index` to -print it (grouped by `status`: In progress / Shipped / Superseded). The -frontmatter in each bundle is the single source of truth; there is no -committed copy to drift. +The listing is **generated**, not maintained — run `just index` to print it: +changes grouped by `status` (In progress / Shipped / Superseded), then +decisions newest-first. The frontmatter in each bundle / decision file is the +single source of truth; there is no committed copy to drift. ## Other - **[`architecture/`](../architecture/)** at the repo root — the living capability truth (config model, instruments, bootstrappers). The promotion target on every ship. +- **[decisions/](decisions/)** — design decisions taken (and alternatives + rejected), each with a revisit trigger, so reviews don't re-litigate them; + indexed by `just index`. - **[audits/](audits/)** — findings reports (2026-05-31 bug+refactor audit, 2026-06-05 bug audit v2). - **[retros/](retros/)** — what we learned after a body of work. diff --git a/planning/_templates/decision.md b/planning/_templates/decision.md new file mode 100644 index 0000000..940fb37 --- /dev/null +++ b/planning/_templates/decision.md @@ -0,0 +1,26 @@ +--- +status: accepted # accepted | superseded +date: YYYY-MM-DD +slug: my-decision +summary: One line — shown in `just index`. +supersedes: null +superseded_by: null +pr: null # PR/commit where the decision was made or recorded +--- + +# One-line capitalized title + +**Decision:** What was decided, in a sentence. + +## Context + +Why this came up; the options that were on the table. + +## Decision & rationale + +The call and why — including why the alternatives were rejected. Enough that a +future explorer doesn't re-litigate it. + +## Revisit trigger + +The concrete signal that should reopen this decision. diff --git a/planning/changes/2026-06-23.01-structured-log-payload/design.md b/planning/changes/2026-06-23.01-structured-log-payload/design.md new file mode 100644 index 0000000..7cc2a0c --- /dev/null +++ b/planning/changes/2026-06-23.01-structured-log-payload/design.md @@ -0,0 +1,201 @@ +--- +status: shipped +date: 2026-06-23 +slug: structured-log-payload +summary: Move the structlog→Sentry log-line contract into a StructuredLogPayload value object so the meta-key vocabulary and parse live in one place, closing the silent-drift failure mode. +supersedes: null +superseded_by: null +pr: 129 +outcome: Shipped as designed — StructuredLogPayload owns parse + STRUCTLOG_META_KEYS in logging_factory; Sentry reduced to event-shape guards + orchestration; back-compat alias retained; round-trip drift net added; 100% coverage held. +--- + +# Design: Give the structlog→Sentry payload a deep module + +## Summary + +The structlog→Sentry enrichment relies on a contract that no module owns: the +logging instrument renders every log line to a flat JSON object, and the Sentry +instrument opportunistically sniffs that shape (`formatted.startswith("{")`), +re-parses it, and strips a hand-maintained set of meta-keys before attaching the +rest to the Sentry event. This change introduces a `StructuredLogPayload` value +object in `logging_factory.py` that owns the parse + the meta-key vocabulary, so +the fragile knowledge lives in one place and the Sentry instrument shrinks to +orchestration. The goal is to close a silent-drift failure mode, not to abstract +for its own sake. + +## Motivation + +The log-event contract is split across two files and joined only by a string +guess: + +- **Producer** (`logging_instrument.py:116`): the structlog processor chain ends + in `JSONRenderer(serializer=_serialize_log_with_orjson_to_string)`. Every line + is a flat JSON object — message under `event`, meta-keys `level` / `logger` / + `tracing` / `timestamp` / `exception`, user kwargs, and an optional + `skip_sentry`. +- **Consumer** (`sentry_instrument.py:39-71`): `enrich_sentry_event_from_structlog_log` + sniffs `formatted.startswith("{")`, `orjson.loads` it, drops on `skip_sentry`, + lifts `event` to the message, and strips `IGNORED_STRUCTLOG_ATTRIBUTES` + (`sentry_instrument.py:19`) — the same meta-key set the producer emits — before + attaching the remainder as `contexts.structlog`. + +The coupling is one-directional and opportunistic: the producer does not know +Sentry exists; Sentry guesses at structlog's output shape. The meta-key +vocabulary is duplicated/implied across both files and owned by neither. Rename a +meta-key or add a top-level meta-processor without updating +`IGNORED_STRUCTLOG_ATTRIBUTES` and the enrichment silently degrades — either the +key leaks into `contexts.structlog`, or (for a renamed message key) the event +passes through unmodified. There is no test that crosses the producer's +serializer and the consumer's parser together, so the drift is invisible until a +production Sentry event looks wrong. + +Applying the deletion test to a fix: a module that owns parse + vocabulary, if +deleted, would scatter the meta-stripping logic and the key set back into Sentry +and concentrate complexity here — it earns its keep. (Surfaced as candidate 1 of +the 2026-06-23 architecture review.) + +## Non-goals + +- **A neutral third module both instruments import.** There is exactly one + consumer (Sentry); a shared-for-sharing's-sake module abstracts a sharing that + does not exist yet and pulls the vocabulary away from the chain that generates + it. One adapter is a hypothetical seam. +- **A symmetric `serialize()` on the value object.** The producer does not + construct a `StructuredLogPayload`; it hands structlog's full `event_dict` to + the existing serializer. A `serialize()` nobody calls would be dead surface. +- **An explicit sentinel/marker key stamped on every log line.** Replacing the + `startswith("{")` heuristic with a marker pollutes the stdout JSON shape for + every logging user to serve one consumer. The heuristic is cheap and adequate. +- **An absolute, compile-time-guaranteed drift fix.** That would require changing + the emitted log shape (nesting user kwargs under a single `extra` key) for all + users. We deliberately trade "impossible" for "unlikely + caught by a + round-trip test" to avoid changing the log shape. +- **Renaming `skip_sentry`.** It stays honest-to-today; rename the day a second + reporter honors it. + +## Design + +### 1. `StructuredLogPayload` value object in `logging_factory.py` + +A consumer-side interpretation type living beside the existing serializer: + +```python +@dataclasses.dataclass(frozen=True, slots=True) +class StructuredLogPayload: + message: str | None # the structlog `event` key; None when absent + extra: dict[str, typing.Any] # user fields, meta-keys already stripped + skip_sentry: bool + + @classmethod + def parse(cls, formatted: str) -> "StructuredLogPayload | None": + """Interpret one rendered structlog line. Returns None when the string + is not a structlog JSON object (non-JSON, decode error, or not a dict).""" +``` + +`parse` owns every "is this even a structlog line" guard (the `startswith("{")` +heuristic, `orjson.JSONDecodeError`, non-dict result) and the meta-stripping. No +caller ever sees the raw dict, so no caller can re-implement or forget the strip. +The value object holds **no** Sentry-event-shape knowledge — mapping onto +`event["logentry"]["formatted"]` / `contexts.structlog` stays in the Sentry +instrument. + +### 2. Meta-key vocabulary moves and is renamed + +`IGNORED_STRUCTLOG_ATTRIBUTES` (`sentry_instrument.py:19`) moves to +`logging_factory.py` as the public symbol `STRUCTLOG_META_KEYS` — named for what +it *is* (the producer's meta-key vocabulary) rather than what Sentry does with +it. It keeps the `STRUCTLOG_` prefix because the keys are structlog / our-chain +conventions. `parse` uses it internally; the round-trip test references it +directly. + +The drift gap is handled pragmatically, not absolutely (see Non-goals): + +- The set lives in `logging_factory`, which `logging_instrument` already imports; + `logging_factory` cannot import `logging_instrument` back (circular), so the + set cannot be *adjacent* to the chain. +- A cross-reference comment at `tracer_injection` / the processor chain in + `logging_instrument.py` directs anyone adding a top-level meta-processor to + extend `STRUCTLOG_META_KEYS`. +- The round-trip test (§4) is the regression net. + +Only `tracing` (from our own `tracer_injection`) is a custom meta-key; the rest +(`level`, `logger`, `timestamp`, `exception`) are stable structlog-stdlib +processor outputs and `event` / `skip_sentry` are conventions. The real drift +surface is narrow: someone adds another custom top-level meta-processor and +forgets the set. + +### 3. Sentry instrument shrinks to orchestration + +`enrich_sentry_event_from_structlog_log` keeps its name and signature (it is the +`before_send` callback) and becomes: + +```python +payload = StructuredLogPayload.parse(formatted_message) +if payload is None: return event # not a structlog JSON line +if payload.skip_sentry: return None # drop — checked BEFORE message +if not payload.message: return event # JSON without an `event` key +event["logentry"]["formatted"] = payload.message +if payload.extra: + event["contexts"]["structlog"] = payload.extra +return event +``` + +The branch ordering is preserved exactly from the current implementation: +`skip_sentry` is honored before the message-presence check, so a line with +`skip_sentry` truthy and no `event` key still drops. `wrap_before_send_callbacks` +is untouched (it chains callbacks; orthogonal to payload parsing). + +### 4. Seam direction and backward compatibility + +The dependency is `sentry → logging_factory` — honest: Sentry consumes logging's +output format, and `logging_factory` is a lower-level mechanics module (it +imports neither instrument), so this is not peer-coupling between instruments. + +`IGNORED_STRUCTLOG_ATTRIBUTES` is a public-named symbol on a shipped (1.1.x) +package, though it is not in `lite_bootstrap.__all__`. A silent module-level +alias `IGNORED_STRUCTLOG_ATTRIBUTES = STRUCTLOG_META_KEYS` stays in +`sentry_instrument.py` per the repo's rename convention, so any external import +keeps working. + +## Operations + +None. No infra, DNS, or external-account changes. + +## Out of scope + +Covered under Non-goals. + +## Testing + +Test-first (red before green). Three layers: + +- **`StructuredLogPayload.parse` table (new).** Raw JSON string in → `message` / + `extra` / `skip_sentry` out, no Sentry event constructed: non-JSON → `None`; + JSON non-dict → `None`; dict without `event` → `message is None`; normal line → + meta-keys stripped from `extra`; `skip_sentry` truthy → flag set; **DES-4**: + `skip_sentry=False` stripped from `extra` (relocated from the Sentry layer with + a comment naming its DES-4 audit origin). +- **Round-trip (new).** A representative `event_dict` → real + `_serialize_log_with_orjson_to_string` → `parse` → assert `message` / `extra` / + `skip_sentry`. Exercises producer serializer + consumer parse + vocabulary + together; this is the drift net. +- **Sentry orchestration (slimmed).** `enrich_sentry_event_from_structlog_log` + over Sentry-shaped events, three outcomes only — drop (`skip_sentry`), + modify+attach (normal), passthrough (non-structlog / `event`-less). Shape + detail now lives at the value-object layer. + +`just test` green, `just lint-ci` clean (`ty` included). + +## Risk + +- **Behavior change masquerading as a refactor (med likelihood × high impact).** + The three branch outcomes and the skip-before-message ordering must be + byte-identical to today. Mitigation: the slimmed Sentry orchestration tests pin + all three outcomes; the DES-4 case is preserved (relocated, not deleted); the + ordering is an explicit test case. +- **Residual drift (low × med).** A future custom meta-processor whose key is not + added to `STRUCTLOG_META_KEYS` still leaks. Mitigation: cross-reference comment + at the chain + the round-trip test (catches it only if the fixture includes the + new key — documented as a known limit, the deliberate trade from Non-goals). +- **Back-compat miss (low × med).** An external importer of + `IGNORED_STRUCTLOG_ATTRIBUTES` breaks. Mitigation: silent alias retained. diff --git a/planning/changes/2026-06-23.01-structured-log-payload/plan.md b/planning/changes/2026-06-23.01-structured-log-payload/plan.md new file mode 100644 index 0000000..05a747b --- /dev/null +++ b/planning/changes/2026-06-23.01-structured-log-payload/plan.md @@ -0,0 +1,186 @@ +--- +status: shipped +date: 2026-06-23 +slug: structured-log-payload +spec: structured-log-payload +pr: 129 +--- + +# structured-log-payload — implementation plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use +> superpowers:subagent-driven-development (recommended) or +> superpowers:executing-plans to implement this plan task-by-task. Steps +> use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Move the structlog log-line contract into a `StructuredLogPayload` +value object in `logging_factory.py`, shrinking the Sentry instrument to +orchestration and closing the meta-key drift failure mode. + +**Spec:** [`design.md`](./design.md) + +**Branch:** `feat/structured-log-payload` + +**Commit strategy:** Per-task commits (each task leaves the suite green). + +TDD throughout: write the failing test first, watch it fail for the right +reason, then implement to green. + +--- + +### Task 1: `StructuredLogPayload` + `STRUCTLOG_META_KEYS` in `logging_factory.py` (red → green) + +**Files:** +- Modify: `lite_bootstrap/instruments/logging_factory.py` +- Modify: `tests/instruments/test_logging_instrument.py` (or new + `tests/instruments/test_structured_log_payload.py` — match where factory tests + already live) + +Introduce the value object and the meta-key vocabulary it owns; prove `parse` +against a table before implementing it. + +- [ ] **Step 1: Write the `parse` table, red.** + + Cases (raw JSON string in → assert on `message` / `extra` / `skip_sentry`): + - non-JSON string (`"plain message"`) → `None` + - JSON non-dict (`"[1, 2]"`) → `None` + - malformed JSON starting with `{` → `None` + - dict without `event` → `payload.message is None` + - normal line (`event` + meta-keys + user kwargs) → `extra` has user kwargs + only, all `STRUCTLOG_META_KEYS` stripped + - `skip_sentry: true` → `payload.skip_sentry is True` + - **DES-4**: `skip_sentry: false` present → stripped from `extra` + (add comment: `# DES-4 (planning/audits/2026-06-05-bug-audit-v2.md): falsy + skip_sentry must not leak into extra`) + + Run `just test -- -k structured_log` → fails (symbol does not exist yet). + +- [ ] **Step 2: Implement `STRUCTLOG_META_KEYS` + `StructuredLogPayload`.** + + Move the frozenset out of `sentry_instrument.py`, rename to + `STRUCTLOG_META_KEYS`, place beside `_serialize_log_with_orjson_to_string`. + Implement `parse` owning all guards (`startswith("{")`, `JSONDecodeError`, + non-dict) and the meta-strip. `message`, `extra`, `skip_sentry` fields; no + `serialize()`; no Sentry-event knowledge. + + Run `just test -- -k structured_log` → green. + +- [ ] **Step 3: Add the round-trip test (drift net), red → green.** + + Build a representative `event_dict` (the keys the producer chain emits) → + `_serialize_log_with_orjson_to_string(event_dict)` → `StructuredLogPayload.parse` + → assert `message` / `extra` (meta stripped) / `skip_sentry`. Green once Step 2 + lands. + +- [ ] **Step 4: Cross-reference comment at the chain.** + + In `logging_instrument.py`, at `tracer_injection` / the processor chain, add a + one-line comment: any new top-level meta-key must be added to + `STRUCTLOG_META_KEYS` in `logging_factory.py`. + +- [ ] **Step 5: Commit.** + + ```bash + git add lite_bootstrap/instruments/logging_factory.py \ + lite_bootstrap/instruments/logging_instrument.py \ + tests/instruments/ + git commit -m "feat: add StructuredLogPayload value object owning log-line parse + + Co-Authored-By: Claude Opus 4.8 (1M context) " + ``` + +--- + +### Task 2: Shrink the Sentry instrument to orchestration + +**Files:** +- Modify: `lite_bootstrap/instruments/sentry_instrument.py` +- Modify: `tests/instruments/test_sentry_instrument.py` + +Rewrite `enrich_sentry_event_from_structlog_log` to call `parse`, leave the +back-compat alias, slim the tests to the three mapping outcomes. + +- [ ] **Step 1: Slim the Sentry tests, keeping behavior pins.** + + Reduce the existing parametrized cases to three orchestration outcomes: + drop (`skip_sentry` truthy), modify+attach (normal), passthrough + (non-structlog / `event`-less). Add an explicit case: `skip_sentry` truthy + + no `event` key still returns `None` (ordering pin). Remove the shape-detail + assertions now covered at the value-object layer (DES-4 relocated in Task 1). + + Run `just test -- -k sentry` → fails against the current implementation only + where the slimmed expectations differ; confirm the failures are the expected + shape before proceeding. + +- [ ] **Step 2: Rewrite `enrich_sentry_event_from_structlog_log`.** + + Replace the inline sniff/parse/strip with the orchestration from the spec + (`parse → None passthrough → skip_sentry drop → no-message passthrough → + replace formatted + attach extra`). Import `StructuredLogPayload` from + `logging_factory`. Leave `wrap_before_send_callbacks` untouched. + +- [ ] **Step 3: Add the back-compat alias.** + + At module level in `sentry_instrument.py`: + `IGNORED_STRUCTLOG_ATTRIBUTES = STRUCTLOG_META_KEYS # back-compat alias` + (import `STRUCTLOG_META_KEYS` from `logging_factory`). Keep a one-line comment + naming the rename. + + Run `just test` (full suite) → green. + +- [ ] **Step 4: Commit.** + + ```bash + git add lite_bootstrap/instruments/sentry_instrument.py \ + tests/instruments/test_sentry_instrument.py + git commit -m "refactor: route Sentry enrichment through StructuredLogPayload + + Co-Authored-By: Claude Opus 4.8 (1M context) " + ``` + +--- + +### Task 3: Ship-time docs + index + +**Files:** +- Modify: `architecture/instruments.md` +- Modify: `CLAUDE.md` +- Modify: `planning/changes/2026-06-23.01-structured-log-payload/design.md` + (frontmatter) + +Promote the conclusions into the living architecture doc and close out the +bundle. + +- [ ] **Step 1: Update `architecture/instruments.md:70-72`.** + + Name the `StructuredLogPayload` seam: the logging instrument emits the rendered + structlog line; `StructuredLogPayload.parse` (in `logging_factory`) owns the + meta-key vocabulary (`STRUCTLOG_META_KEYS`) and the skip/extra interpretation; + the Sentry instrument maps the parsed payload onto the Sentry event. + +- [ ] **Step 2: Update `CLAUDE.md:49`.** + + Adjust the Logging↔Sentry bullet to reference `StructuredLogPayload` / + `STRUCTLOG_META_KEYS` and their `logging_factory` home (replacing the + `IGNORED_STRUCTLOG_ATTRIBUTES`-in-`sentry_instrument` description). + +- [ ] **Step 3: Fill design frontmatter + regenerate index.** + + Set `status: shipped`, fill `summary`, `pr`, `outcome` in `design.md`. Run + `just index`. + +- [ ] **Step 4: Final verification.** + + `just lint-ci` clean (incl. `ty`), `just test` green. Confirm the alias keeps + `from lite_bootstrap.instruments.sentry_instrument import IGNORED_STRUCTLOG_ATTRIBUTES` + importable. + +- [ ] **Step 5: Commit.** + + ```bash + git add architecture/instruments.md CLAUDE.md \ + planning/changes/2026-06-23.01-structured-log-payload/ planning/ + git commit -m "docs: record StructuredLogPayload seam in architecture + CLAUDE + + Co-Authored-By: Claude Opus 4.8 (1M context) " + ``` diff --git a/planning/decisions/.gitkeep b/planning/decisions/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/planning/index.py b/planning/index.py index 6621d3f..392860a 100644 --- a/planning/index.py +++ b/planning/index.py @@ -1,11 +1,12 @@ # ruff: noqa: INP001, D212 # planning/ is not a Python package; D212/D213 conflict differs from faststream-outbox """ -Generate the planning change index from bundle frontmatter. +Generate the planning index from frontmatter. -Run via ``just index``. Globs ``planning/changes/*/``, reads each bundle's -``design.md`` (falling back to ``change.md``) frontmatter, and prints a -Markdown listing grouped by lifecycle status to stdout. Never writes a file: -the listing is a query over the bundles, not a committed artifact. +Run via ``just index``. Globs ``planning/changes/*/`` (each bundle's ``design.md``, +falling back to ``change.md``) and ``planning/decisions/*.md``, reads their +frontmatter, and prints a Markdown listing to stdout — changes grouped by lifecycle +status, then decisions newest-first. Never writes a file: the listing is a query over +the files, not a committed artifact. """ import pathlib @@ -13,6 +14,7 @@ CHANGES_DIR = pathlib.Path(__file__).parent / "changes" +DECISIONS_DIR = pathlib.Path(__file__).parent / "decisions" GROUPS: tuple[tuple[str, tuple[str, ...]], ...] = ( ("In progress", ("draft", "approved")), ("Shipped", ("shipped",)), @@ -55,8 +57,23 @@ def load_bundles() -> list[dict[str, str]]: return bundles +def load_decisions() -> list[dict[str, str]]: + """Read frontmatter from every decision file under ``DECISIONS_DIR``.""" + decisions: list[dict[str, str]] = [] + if not DECISIONS_DIR.is_dir(): + return decisions + for path in sorted(DECISIONS_DIR.glob("*.md")): + if path.name == "README.md" or path.name.startswith("_"): + continue + fields = parse_frontmatter(path.read_text(encoding="utf-8")) + fields["path"] = f"decisions/{path.name}" + fields["name"] = path.stem + decisions.append(fields) + return decisions + + def format_row(bundle: dict[str, str]) -> str: - """Render one bundle as a Markdown list item.""" + """Render one bundle or decision as a Markdown list item.""" slug = bundle.get("slug", "?") path = bundle.get("path", "") pr = bundle.get("pr") or "—" @@ -70,11 +87,11 @@ def format_row(bundle: dict[str, str]) -> str: return line -def render(bundles: list[dict[str, str]]) -> str: - """Render the full grouped Markdown listing.""" - out = ["# Change index", "", "_Generated by `just index` — do not edit._", ""] +def render(bundles: list[dict[str, str]], decisions: list[dict[str, str]]) -> str: + """Render the full Markdown listing: changes by status, then decisions.""" + out = ["# Planning index", "", "_Generated by `just index` — do not edit._", "", "## Changes", ""] for title, statuses in GROUPS: - out += [f"## {title}", ""] + out += [f"### {title}", ""] rows = sorted( (b for b in bundles if b.get("status") in statuses), key=lambda b: b.get("name", ""), @@ -82,12 +99,16 @@ def render(bundles: list[dict[str, str]]) -> str: ) out += [format_row(b) for b in rows] if rows else ["_None._"] out.append("") + out += ["## Decisions", ""] + decision_rows = sorted(decisions, key=lambda d: d.get("name", ""), reverse=True) + out += [format_row(d) for d in decision_rows] if decision_rows else ["_None._"] + out.append("") return "\n".join(out).rstrip() + "\n" def main() -> int: """Print the listing to stdout.""" - sys.stdout.write(render(load_bundles())) + sys.stdout.write(render(load_bundles(), load_decisions())) return 0 diff --git a/tests/instruments/test_sentry_instrument.py b/tests/instruments/test_sentry_instrument.py index 8c25646..4949bcf 100644 --- a/tests/instruments/test_sentry_instrument.py +++ b/tests/instruments/test_sentry_instrument.py @@ -84,24 +84,30 @@ def test_sentry_instrument_empty_dsn() -> None: class TestSentryEnrichEventFromStructlog: + """enrich_* owns the Sentry-event-shape guards and the orchestration branches. + + Line-content parsing (meta-stripping, skip detection, JSON guards) is the + StructuredLogPayload.parse contract and is covered in test_structured_log_payload.py. + """ + @pytest.mark.parametrize( "event", [ - {}, - {"logentry": None}, - {"logentry": {}}, - {"logentry": {"formatted": b""}}, - {"logentry": {"formatted": ""}}, - {"logentry": {"formatted": "hi"}}, - {"logentry": {"formatted": "[]"}}, - {"logentry": {"formatted": "[{}]"}}, - {"logentry": {"formatted": "{"}, "contexts": {}}, - {"logentry": {"formatted": "{}"}, "contexts": {}}, + {}, # no logentry + {"logentry": {"formatted": b""}}, # formatted is not a str + {"logentry": {"formatted": "hi"}, "contexts": {}}, # parse returns None (not a structlog line) + {"logentry": {"formatted": "{}"}, "contexts": {}}, # parsed, but no `event` key -> no message ], ) - def test_skip(self, event: "sentry_types.Event") -> None: + def test_passthrough_leaves_event_unmodified(self, event: "sentry_types.Event") -> None: assert enrich_sentry_event_from_structlog_log(copy.deepcopy(event), {}) == event + def test_drops_event_on_skip_sentry_before_message_check(self) -> None: + # Ordering pin: skip_sentry is honored before the message-presence check, so a + # line with skip_sentry truthy and no `event` key still drops (returns None). + event: sentry_types.Event = {"logentry": {"formatted": '{"skip_sentry": true}'}, "contexts": {}} + assert enrich_sentry_event_from_structlog_log(event, {}) is None + @pytest.mark.parametrize( ("event_before", "event_after"), [ @@ -121,19 +127,11 @@ def test_skip(self, event: "sentry_types.Event") -> None: "contexts": {"structlog": {"foo": "bar"}}, }, ), - ( - { - "logentry": {"formatted": '{"event": "event name", "skip_sentry": false, "foo": "bar"}'}, - "contexts": {}, - }, - { - "logentry": {"formatted": "event name"}, - "contexts": {"structlog": {"foo": "bar"}}, - }, - ), ], ) - def test_modify(self, event_before: "sentry_types.Event", event_after: "sentry_types.Event") -> None: + def test_modify_lifts_message_and_attaches_extra( + self, event_before: "sentry_types.Event", event_after: "sentry_types.Event" + ) -> None: assert enrich_sentry_event_from_structlog_log(event_before, {}) == event_after diff --git a/tests/instruments/test_structured_log_payload.py b/tests/instruments/test_structured_log_payload.py new file mode 100644 index 0000000..68435b8 --- /dev/null +++ b/tests/instruments/test_structured_log_payload.py @@ -0,0 +1,79 @@ +from lite_bootstrap.instruments.logging_factory import ( + STRUCTLOG_META_KEYS, + StructuredLogPayload, + _serialize_log_with_orjson_to_string, +) + + +def test_parse_normal_line_lifts_message_and_strips_meta() -> None: + formatted = ( + '{"event": "hello", "level": "info", "logger": "app", "timestamp": "2026-06-23T00:00:00Z", "foo": "bar"}' + ) + + payload = StructuredLogPayload.parse(formatted) + + assert payload is not None + assert payload.message == "hello" + assert payload.extra == {"foo": "bar"} + assert payload.skip_sentry is False + + +def test_parse_non_json_line_returns_none() -> None: + assert StructuredLogPayload.parse("plain text message") is None + + +def test_parse_non_object_json_returns_none() -> None: + assert StructuredLogPayload.parse("[1, 2, 3]") is None + + +def test_parse_malformed_json_object_returns_none() -> None: + assert StructuredLogPayload.parse('{"event": ') is None + + +def test_parse_line_without_event_has_no_message() -> None: + payload = StructuredLogPayload.parse('{"level": "info", "foo": "bar"}') + + assert payload is not None + assert payload.message is None + assert payload.extra == {"foo": "bar"} + + +def test_parse_truthy_skip_sentry_sets_flag() -> None: + payload = StructuredLogPayload.parse('{"event": "drop me", "skip_sentry": true}') + + assert payload is not None + assert payload.skip_sentry is True + + +def test_parse_falsy_skip_sentry_is_stripped_from_extra() -> None: + # DES-4 (planning/audits/2026-06-05-bug-audit-v2.md): a falsy skip_sentry flag + # must not leak into extra; it is a meta-key, stripped regardless of value. + payload = StructuredLogPayload.parse('{"event": "keep", "skip_sentry": false, "foo": "bar"}') + + assert payload is not None + assert payload.skip_sentry is False + assert payload.extra == {"foo": "bar"} + + +def test_round_trip_through_real_serializer_strips_every_meta_key() -> None: + # Drift net: a representative event_dict the producer's structlog chain emits, + # serialized by the real serializer and parsed back. If a meta-key the chain + # emits is missing from STRUCTLOG_META_KEYS, it leaks into extra and this fails. + event_dict = { + "event": "request handled", + "level": "info", + "logger": "app.api", + "tracing": {"span_id": "abc", "trace_id": "def"}, + "timestamp": "2026-06-23T00:00:00Z", + "skip_sentry": False, + "user_id": 42, + "path": "/health", + } + + formatted = _serialize_log_with_orjson_to_string(event_dict) + payload = StructuredLogPayload.parse(formatted) + + assert payload is not None + assert payload.message == "request handled" + assert payload.extra == {"user_id": 42, "path": "/health"} + assert not (payload.extra.keys() & STRUCTLOG_META_KEYS)