Skip to content

fix: flush telemetry for short-lived runs#93

Merged
dialupdisaster merged 1 commit into
mainfrom
fix/flush-telemetry-on-run
Jul 2, 2026
Merged

fix: flush telemetry for short-lived runs#93
dialupdisaster merged 1 commit into
mainfrom
fix/flush-telemetry-on-run

Conversation

@dialupdisaster

@dialupdisaster dialupdisaster commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add a shared force flush helper for OTel providers
  • flush telemetry after completed assistant messages and terminal session events
  • force flush before provider shutdown

Fixes #92

Validation

  • bun run lint
  • bun run check:jsdoc-coverage
  • bun run typecheck
  • bun test

Summary by CodeRabbit

  • New Features

    • Telemetry is now flushed more proactively during runtime, including when sessions go idle, encounter errors, or after a completed assistant response.
    • Shutdown now performs an explicit telemetry flush before closing services, helping preserve the latest events.
  • Bug Fixes

    • Improved reliability of telemetry delivery by preventing duplicate flush attempts and handling flush failures more gracefully.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a forceFlushOtel helper that concurrently flushes OTEL meter, logger, and tracer providers, refactors setupOtel usage in the plugin into a shared providers object, and triggers telemetry flushes on session idle/error, completed assistant messages, and shutdown, with an accompanying test.

Changes

Force Flush Telemetry

Layer / File(s) Summary
forceFlushOtel helper
src/otel.ts
Adds exported async forceFlushOtel(providers) that concurrently calls forceFlush() on meter, logger, and tracer providers via Promise.allSettled, suppressing individual failures.
Providers refactor and shutdown flush
src/index.ts
Imports forceFlushOtel; refactors setupOtel result into a shared providers object; introduces a shuttingDown guard and flushTelemetry(reason) helper; updates shutdown() to flush before shutting down providers.
Flush on session/message events
src/index.ts
Adds awaited flushTelemetry calls after session.idle, session.error, and completed assistant message events.
Tests
tests/otel.test.ts
Imports forceFlushOtel and adds a test suite verifying it invokes forceFlush on metric, logger, and tracer providers.

Estimated code review effort: 2 (Simple) | ~15 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Runtime as OpenCode Runtime
    participant Plugin as Plugin (index.ts)
    participant Otel as otel.ts
    participant Providers as OTEL Providers

    Runtime->>Plugin: session.idle / session.error
    Plugin->>Plugin: flushTelemetry(reason)
    Plugin->>Otel: forceFlushOtel(providers)
    Otel->>Providers: forceFlush() (meter, logger, tracer)
    Providers-->>Otel: settled results

    Runtime->>Plugin: message (assistant, completed)
    Plugin->>Plugin: flushTelemetry("message.completed")
    Plugin->>Otel: forceFlushOtel(providers)
    Otel->>Providers: forceFlush() (meter, logger, tracer)

    Runtime->>Plugin: shutdown()
    Plugin->>Plugin: shuttingDown = true
    Plugin->>Otel: forceFlushOtel(providers)
    Plugin->>Providers: shutdown meter/logger/tracer
Loading

Related issues: #92 (opencode run not sending telemetry) — addressed by adding explicit flush calls on session idle/error and completed assistant message events so telemetry is sent without waiting for full provider shutdown.

Suggested labels: bug, telemetry

Suggested reviewers: DEVtheOPS

🐰 A rabbit taps the OTEL drum,
flush the logs before they're numb,
session idle, message done,
telemetry sent, the race is won!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: forcing telemetry flushes for short-lived runs.
Linked Issues check ✅ Passed The changes add force-flush support and trigger it during run completion and shutdown, addressing #92's short-lived telemetry loss.
Out of Scope Changes check ✅ Passed All changes are focused on telemetry flushing and related tests, with no unrelated scope added.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flush-telemetry-on-run

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@dialupdisaster

Copy link
Copy Markdown
Contributor Author

/opencode give this a quick look

@opencode-agent

opencode-agent Bot commented Jul 2, 2026

Copy link
Copy Markdown

Clean change — addresses #92 with minimal surface area.

src/otel.ts

  • New forceFlushOtel helper correctly uses Promise.allSettled so one failing provider doesn't block the others.

src/index.ts

  • flushTelemetry guards against double-flushing with the shuttingDown flag, which is important since shutdown() also forces a flush.
  • Flush points are well-placed: session.idle, session.error, and completed assistant messages — the exact spots where data is most at risk in short-lived runs.

tests/otel.test.ts

  • Straightforward test covering all three provider flushes.

No issues from me. LGTM.

New%20session%20-%202026-07-02T23%3A02%3A52.153Z
opencode session  |  github run

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/otel.test.ts (1)

126-140: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Missing coverage for partial-failure resilience.

forceFlushOtel uses Promise.allSettled specifically so one provider's rejected forceFlush() doesn't stop the others or throw. This test only covers the happy path; add a case where one provider's forceFlush rejects to confirm the others still flush and the function resolves without throwing.

✅ Suggested additional test
   await forceFlushOtel(fakeProviders)

   expect(calls.sort()).toEqual(["logs", "metrics", "traces"])
 })
+
+ test("continues flushing other providers when one rejects", async () => {
+   const calls: string[] = []
+   const fakeProviders = {
+     meterProvider: { forceFlush: async () => { throw new Error("boom") } },
+     loggerProvider: { forceFlush: async () => { calls.push("logs") } },
+     tracerProvider: { forceFlush: async () => { calls.push("traces") } },
+   } as unknown as OtelProviders
+
+   await expect(forceFlushOtel(fakeProviders)).resolves.not.toThrow()
+   expect(calls.sort()).toEqual(["logs", "traces"])
+ })
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/otel.test.ts` around lines 126 - 140, The current forceFlushOtel test
only covers the all-success path; add a new case in otel.test.ts under
forceFlushOtel that makes one OtelProviders forceFlush reject while the others
resolve, then verify the remaining providers still run and that forceFlushOtel
completes without throwing. Use the existing forceFlushOtel helper and the
fakeProviders pattern to assert Promise.allSettled resilience rather than
fail-fast behavior.
src/index.ts (1)

338-340: 🚀 Performance & Scalability | 🔵 Trivial

Consider the export volume from flushing on every completed assistant message.

flushTelemetry("message.completed") forces a full export of all three signals on every assistant turn. In long-lived interactive (TUI) sessions this fires repeatedly, negating the batching provided by BatchSpanProcessor/BatchLogRecordProcessor/PeriodicExportingMetricReader and increasing OTLP request volume. It is also partly redundant with the session.idle flush that fires shortly after (Line 287). Since the PR goal is reliably capturing telemetry for short-lived opencode run, session.idle alone may already cover that path; consider whether the per-message flush is needed or should be debounced.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/index.ts` around lines 338 - 340, The per-assistant-turn flush in the
message handling flow causes excessive full telemetry exports and duplicates the
later session-idle flush. In the `src/index.ts` message completion path
(`info.role === "assistant" && info.time?.completed`), remove or debounce
`flushTelemetry("message.completed")` so telemetry is not exported on every
completed assistant message, and rely on the existing `session.idle` flush path
for the short-lived `opencode run` case unless a specific need for per-message
flushing remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/index.ts`:
- Around line 338-340: The per-assistant-turn flush in the message handling flow
causes excessive full telemetry exports and duplicates the later session-idle
flush. In the `src/index.ts` message completion path (`info.role === "assistant"
&& info.time?.completed`), remove or debounce
`flushTelemetry("message.completed")` so telemetry is not exported on every
completed assistant message, and rely on the existing `session.idle` flush path
for the short-lived `opencode run` case unless a specific need for per-message
flushing remains.

In `@tests/otel.test.ts`:
- Around line 126-140: The current forceFlushOtel test only covers the
all-success path; add a new case in otel.test.ts under forceFlushOtel that makes
one OtelProviders forceFlush reject while the others resolve, then verify the
remaining providers still run and that forceFlushOtel completes without
throwing. Use the existing forceFlushOtel helper and the fakeProviders pattern
to assert Promise.allSettled resilience rather than fail-fast behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 98fa8af1-ce25-4d24-910d-920a3ed628a4

📥 Commits

Reviewing files that changed from the base of the PR and between 09b039d and df37dd0.

📒 Files selected for processing (3)
  • src/index.ts
  • src/otel.ts
  • tests/otel.test.ts

@dialupdisaster dialupdisaster merged commit 0744ec3 into main Jul 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: opencode run doesn't send telemetry

1 participant