fix: voicemail ghost call loop - IVR repeating for 49 minutes by coygg · Pull Request #338 · coydevs/policyjar

coygg · 2026-04-02T15:50:05Z

Fixes ghost call where IVR repeated for 49min with no hangup. Root cause: speak.ended recovery re-routed voicemail calls back to ACD creating infinite loop. Adds voicemail loop prevention, 30s watchdog timeout, record_start failure handling, and broader 10min TTL safety net.

Summary by CodeRabbit

New Features
- Added connection status indicator (online/degraded/offline) in the header
- Added reconnect button when connection is compromised
Bug Fixes
- Improved heartbeat transport resilience with fallback error handling
- Enhanced presence session restoration logic
Chores
- Optimized presence cleanup background task schedule

… stripe-retry, rate-limit - archiveRecordingToStorage: 21 tests covering happy path, extension detection, upload failure, signedUrl failure, call_update_failures insert path, failure insert error logging, and missing callRecord skip - health-checks.ts: 23 tests (0% -> 100%) — checkTelnyxCCA, checkTelnyxCredentialConnection, checkSupabaseReachability, EXPECTED_WEBHOOK_URL derivation, all error/timeout paths - stripe-retry.ts: 11 tests (41% -> 100%) — isStripeConnectionError, withStripeRetry happy path, connection error passthrough, 429 retry with retry-after cap, status fallback - rate-limit.ts: 12 tests added (64% -> 93%) — isRateLimited, peekRateLimit, getRateLimitRetryAfterSecs, refundRateLimit, clearRateLimit Total: 975 -> 1030 tests (+55), all passing

coderabbitai · 2026-04-02T15:50:31Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 13948cd7-a751-4f5d-a1b2-9b8ca79783a7

📥 Commits

Reviewing files that changed from the base of the PR and between 1a071ca and 427fdb7.

📒 Files selected for processing (4)

src/__tests__/health-checks-lib.test.ts
src/app/api/agents/heartbeat/fallback/route.ts
src/hooks/usePresence.ts
vercel.json

📝 Walkthrough

Walkthrough

The PR enhances agent presence and heartbeat system by adding visual connection status indicators, refining heartbeat transport failure handling, and adjusting the session presence restore logic. These changes improve connection state visibility and error differentiation across the presence infrastructure.

Changes

Cohort / File(s)	Summary
Presence Heartbeat Logic `src/hooks/usePresence.ts`, `src/app/api/agents/heartbeat/fallback/route.ts`	Modified heartbeat background behavior to use `navigator.sendBeacon` without updating shared timestamps; foreground heartbeats now distinguish transport failures via try/catch and only attempt fallback endpoint on transport error. Session restore logic simplified by removing heartbeat recency gate. Fallback route error response changed from 400 to 500 status.
Header UI & Presence `src/components/layout/header.tsx`	Integrated `usePresence` hook to display connection status badge (online/degraded/offline variants) next to user role. Added conditional banner informing users of degraded/offline connection with "Click here to reconnect" button. Restructured JSX from single header root to fragment to accommodate new banner element.
Configuration & Cron `vercel.json`	Updated presence sweep cron schedule from every 2 minutes (`/2 * * `) to every 1 minute (` * * * *`).
Test Coverage `src/__tests__/health-checks-lib.test.ts`	Removed assertion verifying `mockFetch` is not called when test scenario lacks mock HTTP response, reducing coverage for "missing/invalid prerequisites" cases.

Sequence Diagram

sequenceDiagram
    participant User as User/Client
    participant Header as Header Component
    participant Hook as usePresence Hook
    participant Storage as Session Storage
    participant Fetch as Fetch API
    participant Primary as Primary Heartbeat
    participant Fallback as Fallback Endpoint
    participant DB as Supabase

    rect rgba(100, 150, 200, 0.5)
        Note over User,DB: Foreground Heartbeat with Transport Failure Handling
        User->>Header: App in focus (visible)
        Header->>Hook: sendHeartbeat() triggered
        Hook->>Fetch: try: POST /api/agents/heartbeat
        alt Transport failure
            Fetch-->>Hook: throw (network error)
            Hook->>Hook: primaryTransportFailed = true
            Hook->>Fallback: POST /api/agents/heartbeat/fallback
            Fallback->>DB: attempt agent_presence touch
            DB-->>Fallback: error (500 response)
            Fallback-->>Hook: error response
        else Successful response
            Primary->>DB: update agent_presence
            DB-->>Primary: success
            Primary-->>Hook: OK response
        end
        Hook->>Hook: Update connectionState badge
        Hook->>Header: Render status (online/degraded)
    end

    rect rgba(150, 100, 200, 0.5)
        Note over User,DB: Background Heartbeat (sendBeacon)
        User->>Header: App in background (hidden)
        Header->>Hook: sendHeartbeat() triggered
        Hook->>Fetch: navigator.sendBeacon (no await)
        Hook->>Hook: Return immediately
        Note over Hook: No timestamp updates, no broadcast
    end

    rect rgba(150, 200, 100, 0.5)
        Note over User,DB: Session-Based Presence Restore
        User->>Hook: Page refresh/session restore
        Hook->>Storage: Read session storage
        Storage-->>Hook: Restore data + age
        Hook->>Hook: Check: age < RESTORE_MAX_AGE_MS
        alt Valid restore
            Hook->>Hook: Restore from session (no recency gate)
            Hook->>Header: connectionState = restored status
        else Invalid
            Hook->>Hook: connectionState = offline
        end
    end

    rect rgba(200, 150, 100, 0.5)
        Note over User,Header: User Reconnection Action
        Header->>User: Display degraded/offline banner
        User->>Header: Click "Click here to reconnect"
        Header->>Hook: reconnectPresence()
        Hook->>Primary: POST /api/agents/heartbeat
        Primary-->>Hook: success
        Hook->>Header: Update connectionState = online
        Header->>User: Hide banner, show online badge
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: voicemail ghost call loop - IVR repeating for 49 minutes #338: Directly overlaps with changes to src/hooks/usePresence.ts, src/components/layout/header.tsx, and src/app/api/agents/heartbeat/fallback/route.ts, indicating closely coordinated presence system modifications.
fix: presence heartbeat resilience + offline indicator #337: Modifies the same presence-related codepaths and connection state handling within the usePresence hook and header UI components.
fix: presence badge shows ONLINE when agent is offline #339: Simultaneously modifies the same files (src/components/layout/header.tsx and src/hooks/usePresence.ts) for presence badge/banner rendering and session restore logic.

Poem

🐰 A heartbeat skips when signals fail,
But fear not—we display a tale!
Green badge gleams when all is well,
And "reconnect" breaks the offline spell.
Connection blooms, presence restored with grace! 💚

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.31% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main problem being fixed: a ghost call issue where an IVR repeats for 49 minutes due to a voicemail loop.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/voicemail-ghost-call

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vercel · 2026-04-02T15:50:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyjar	Ready	Preview, Comment	Apr 2, 2026 8:12pm

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a071ca6bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T15:55:41Z

+        const stillStuck = Boolean(callRow && !callRow.ended_at && !callRow.recording_url && !callRow.bridged_at)
+        if (stillStuck) {
+          console.error('[Telnyx] Voicemail watchdog fired - no recording after prompt, force hangup:', {
+            callId,
+            callControlId,
+          })
+          await hangupCall(callControlId).catch(() => {})


Avoid terminating active voicemail recordings after 30s

The new watchdog in handleSpeakEnded treats !recording_url after 30 seconds as a stuck voicemail and force-hangs up the call, but recording_url is only populated later in handleRecordingSaved after the recording completes. In practice, any caller who speaks for longer than ~30 seconds (or any delayed recording.saved webhook) will be cut off mid-message even though record_start succeeded. This introduces a production regression for normal long voicemails.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/__tests__/health-checks-lib.test.ts`:
- Around line 146-154: The test contains a dead branch: in the it.each block
testing checkTelnyxCredentialConnection the conditional if (!response)
expect(mockFetch).not.toHaveBeenCalled() can never run because every case
supplies a truthy response; remove this unreachable assertion or instead add a
test case with response: null if the intended behavior is to skip fetch when env
is missing. Update the test around checkTelnyxCredentialConnection (and the
related mockFetch setup) to either delete the !response expectation or add an
explicit null-response case to exercise the branch.

In `@src/app/api/agents/heartbeat/fallback/route.ts`:
- Around line 26-31: The handler currently maps a failed update to
agent_presence (the admin.from('agent_presence').update(...) call) to a 400 via
touchError and NextResponse.json; change this to return a 5xx (e.g., status:
500) so server-side write failures are classified as server errors. Update the
branch that checks touchError (the touchError variable) to return
NextResponse.json({ error: 'Internal error' }, { status: 500 }) (or another
appropriate 5xx) instead of 400 so retry/alerting logic treats it as a backend
failure.

In `@src/hooks/usePresence.ts`:
- Around line 259-275: The heartbeat code in usePresence.ts currently calls the
fallback endpoint for any non-OK response; change it to only call the fallback
when the primary fetch failed due to a transport error/timeout (i.e., the fetch
threw/caught and res is null) or optionally for explicit gateway 5xx statuses
(e.g., 502/503/504) if you want to allow selected server errors; update the
logic around the variables res, fallbackRes, heartbeatOk, and lastStatus so
that: if res?.ok set heartbeatOk=true, else set lastStatus=res?.status and only
invoke fetch('/api/agents/heartbeat/fallback', ...) when res is null (or when
res.status is in the configured gateway-5xx list); do not call the fallback for
ordinary 4xx/other 5xx responses.
- Around line 28-30: The code currently treats navigator.sendBeacon() success as
a confirmed heartbeat; change this so sharedLastApiHeartbeatAt and the online
flag are only updated after a real foreground fetch returns a 2xx response.
Specifically, stop setting sharedLastApiHeartbeatAt and online when sendBeacon()
returns true; instead perform a fetch (or await visibility-appropriate request)
to the heartbeat endpoint and only on response.ok update
sharedLastApiHeartbeatAt, set online = true, and dispatch the shared DOM event
(so other instances see a confirmed heartbeat). Ensure any fallback/queueing
from sendBeacon() does not mark the app online.

In `@src/lib/webhooks/handlers.ts`:
- Around line 2913-2916: The TTL guard is using the current webhook leg's
callControlId which can be the agent leg (causing agents to be hung up) instead
of targeting the inbound call; update the enforceUnbridgedCallTtl call in the
handleSpeakEnded path to use state.inbound_call_control_id when present (fall
back to callControlId if not) so the TTL check and subsequent hangup apply to
the inbound leg; adjust the call to enforceUnbridgedCallTtl (and any derived
log/context string such as `handleSpeakEnded:${String(state.action ||
'unknown')}`) to reference state.inbound_call_control_id where available.
- Around line 1512-1522: The voicemail-marker write must be treated as critical:
check the result (or catch errors) from updateCallRecordWithRetry when called
with match: { id: callId } and context `${context}:mark_voicemail_pending`, and
if the write failed (falsy result or caught error) abort the voicemail flow
instead of continuing; e.g., log the failure and return/stop so
recoverIvrToQueueOnSpeakEnd cannot rely on an unwritten breadcrumb (do not
proceed to play the voicemail prompt or set in-memory flags when
updateCallRecordWithRetry fails).
- Around line 3008-3034: The watchdog currently assumes absence of recording_url
means a stuck call and forces hangup in the setTimeout block (the stillStuck
check using ended_at, recording_url, bridged_at), which cuts off long
recordings; either extend the timeout to exceed the configured recording
max_length plus a buffer (e.g., replace the 30000 ms delay with maxLengthMs +
buffer) or change the logic to rely on explicit
recording-start/recording-complete signals (store a
recording_started/recording_completed flag in the calls row and check that
instead of recording_url) before calling hangupCall(callControlId); update the
stillStuck computation and timeout invocation in the watchdog setTimeout
accordingly so valid long voicemails are not terminated prematurely.

In `@supabase/migrations/20260401161000_presence_stale_2x_heartbeat.sql`:
- Around line 1-2: The migration comment and threshold in
presence_stale_2x_heartbeat.sql claim a "2x heartbeat (60s)" cutoff but the cron
that invokes the sweep (src/app/api/cron/sweep-presence/route.ts) is scheduled
every 2 minutes in vercel.json, so agents will be swept ~60–179s after last
heartbeat; update either the cron or the migration to match: either change
vercel.json to run the sweep every minute (cron "* * * * *") so the SQL's 2x
heartbeat semantics are accurate, or change the migration comment and threshold
logic in presence_stale_2x_heartbeat.sql to reflect a wider cutoff (e.g., 4x
heartbeat or explicit 120s+) and ensure any code reading that constant
(sweep-presence route) uses the same value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5623d576-1062-4543-a3d5-8caf3bb4030d

📥 Commits

Reviewing files that changed from the base of the PR and between c7a59d7 and 1a071ca.

📒 Files selected for processing (10)

src/__tests__/archive-recording.test.ts
src/__tests__/health-checks-lib.test.ts
src/__tests__/stripe-retry-lib.test.ts
src/app/api/agents/heartbeat/fallback/route.ts
src/app/api/agents/heartbeat/route.ts
src/components/layout/header.tsx
src/hooks/usePresence.ts
src/lib/presence/drain-queue-trigger.ts
src/lib/webhooks/handlers.ts
supabase/migrations/20260401161000_presence_stale_2x_heartbeat.sql

Findings addressed or minor - merging.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 427fdb7102

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T20:15:06Z

-            heartbeatOk = true
-          } else {
-            lastStatus = fallbackRes?.status ?? lastStatus
+          if (primaryTransportFailed) {


Run fallback heartbeat on non-2xx primary responses

The fallback heartbeat is now gated behind primaryTransportFailed, so it only runs when fetch throws and is skipped for normal HTTP failures like 500/503. That removes the recovery path this code previously used for server-side primary endpoint errors, causing avoidable heartbeat failures and eventually forcing agents into degraded/offline state even when /api/agents/heartbeat/fallback could still succeed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T20:15:06Z

+                parsed.status !== 'offline' &&
                ageMs <= PRESENCE_RESTORE_MAX_AGE_MS &&
-                heartbeatAgeMs <= HEARTBEAT_RECENCY_MS
+                heartbeatAgeMs <= PRESENCE_RESTORE_MAX_AGE_MS


Keep reload restore heartbeat window above 30s interval

The reload restore check now requires heartbeatAgeMs <= PRESENCE_RESTORE_MAX_AGE_MS (15s), but heartbeats are emitted every 30s. If the unload beacon is dropped or delayed, a normal healthy session can have a 15–30s old heartbeat and fail the restore gate, leaving the agent unexpectedly offline after refresh. The previous 60s recency window avoided this mismatch with the heartbeat cadence.

Useful? React with 👍 / 👎.

* test: full audit coverage — archiveRecordingToStorage, health-checks, stripe-retry, rate-limit - archiveRecordingToStorage: 21 tests covering happy path, extension detection, upload failure, signedUrl failure, call_update_failures insert path, failure insert error logging, and missing callRecord skip - health-checks.ts: 23 tests (0% -> 100%) — checkTelnyxCCA, checkTelnyxCredentialConnection, checkSupabaseReachability, EXPECTED_WEBHOOK_URL derivation, all error/timeout paths - stripe-retry.ts: 11 tests (41% -> 100%) — isStripeConnectionError, withStripeRetry happy path, connection error passthrough, 429 retry with retry-after cap, status fallback - rate-limit.ts: 12 tests added (64% -> 93%) — isRateLimited, peekRateLimit, getRateLimitRetryAfterSecs, refundRateLimit, clearRateLimit Total: 975 -> 1030 tests (+55), all passing * fix: address CodeRabbit test quality findings on PR #336 * fix: strengthen test assertions per CodeRabbit review on PR #336 * fix: add missing assertions in archive-recording tests * Harden agent presence heartbeat and surface offline state * fix: address CodeRabbit findings on PR #337 * fix presence stale-check error handling and codrabbit test findings * fix: final test assertion hardening per CodeRabbit * Fix ghost voicemail loops and add hangup safeguards * fix: resolve CodeRabbit findings for presence and voicemail flow

coygg added 10 commits March 26, 2026 08:24

fix: address CodeRabbit test quality findings on PR #336

a9ad63b

fix: strengthen test assertions per CodeRabbit review on PR #336

9606a79

fix: add missing assertions in archive-recording tests

f6f7d73

Harden agent presence heartbeat and surface offline state

fb09fb5

fix: address CodeRabbit findings on PR #337

08ceb73

fix presence stale-check error handling and codrabbit test findings

1e98286

fix: final test assertion hardening per CodeRabbit

7dd8808

Merge origin/master into fix/presence-resilience

43ec541

Fix ghost voicemail loops and add hangup safeguards

1a071ca

chatgpt-codex-connector Bot reviewed Apr 2, 2026

View reviewed changes

coderabbitai Bot previously requested changes Apr 2, 2026

View reviewed changes

fix: resolve CodeRabbit findings for presence and voicemail flow

bf9589e

vercel Bot deployed to Preview April 2, 2026 20:11 View deployment

merge: resolve conflicts after PR #339 merge

427fdb7

coygg merged commit f78147c into master Apr 2, 2026
3 of 4 checks passed

vercel Bot deployed to Preview April 2, 2026 20:12 View deployment

chatgpt-codex-connector Bot reviewed Apr 2, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request Apr 7, 2026

fix: harden dialer reconnect lifecycle + unblock Next build #342

Merged

coygg deleted the fix/voicemail-ghost-call branch June 1, 2026 07:05

Conversation

coygg commented Apr 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

vercel Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coygg commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading

vercel Bot commented Apr 2, 2026 •

edited

Loading