Skip to content

Make subscription status durable and report data freshness separately#883

Merged
testower merged 3 commits into
masterfrom
fix/subscription-status-durability
Jun 24, 2026
Merged

Make subscription status durable and report data freshness separately#883
testower merged 3 commits into
masterfrom
fix/subscription-status-durability

Conversation

@testower

Copy link
Copy Markdown
Collaborator

Problem

The public status endpoint (/status/feed-providers) reported actively-running feeds as STOPPED. Confirmed in prod: 32 STARTED / 44 enabled-STOPPED, while the STOPPED feeds (boltoslo, oslobysykkel, getaround*, ryde*, dott*) were all serving data minutes old.

Root cause: subscription status was a write-once Redis flag, defaulted to STOPPED when absent, that the restart path removed and a leader restart wiped — fully decoupled from whether the feed was actually being polled. Investigation ruled out multi-leader, Redis eviction (evicted_keys=0), and read-replica divergence; the raw leader map literally held only 33 entries with the broken feeds absent.

Changes

1. Durable subscription desired-state (the "survives leader restart" requirement)

  • SubscriptionRegistry.clearInMemory() preserves the durable Redis status on leader (re)start; start()/stop() no longer wipe it.
  • SubscriptionRegistry.isStopped() + FeedUpdater.shouldSubscribe()createSubscriptions skips an enabled provider whose subscription is explicitly STOPPED, so a stopped feed is not auto-resubscribed by a new leader.

2. Restart/stop divergence fix

  • restartSubscription removes only the in-memory id (removeSubscriptionId), never HDEL'ing the durable status → never momentarily absent.
  • Setup failures keep STARTING; new maybeRetrySubscription guards the 60s retry against reviving stopped feeds or creating duplicate pollers.

3. Report data freshness separately

  • New FeedFreshnessService (overdue logic extracted from MetricUpdater, which now delegates) exposes isLive / lastUpdated.
  • PublicFeedProviderStatus gains dataFresh (boolean) + lastUpdated (epoch seconds). subscriptionStatus = durable intent; dataFresh = actual data flow.
  • Public status UI shows a Fresh/Stale chip and a relative "Last Updated" column.

Behaviour

enabled desired-state fresh data subscriptionStatus dataFresh
true STARTED yes STARTED true
true STARTED no (upstream dead) STARTED false
true STOPPED STOPPED false
false STOPPED false

Rollout

The DTO change is additive (backward-compatible). On first deploy the currently-broken feeds self-correct: they are absent in the durable map → treated as "should start" → resubscribed and persisted as STARTED; a genuinely-stopped feed stays stopped.

Testing

  • Java: full suite 319 tests, 0 failures/errors; mvn prettier:check clean. New/extended: FeedFreshnessServiceTest, PublicFeedProviderStatusMapperTest, SubscriptionRegistryTest, FeedUpdaterSubscriptionTest, MetricsUpdaterTest.
  • UI: tsc --noEmit, eslint, vite build, prettier — all clean.

testower added 2 commits June 24, 2026 11:49
The public status endpoint reported running feeds as STOPPED because the
subscription status was a write-once Redis flag, defaulted to STOPPED when
absent, that the restart path removed and a leader restart wiped — fully
decoupled from whether the feed was actually being polled.

- Durable desired-state: SubscriptionRegistry.clearInMemory() preserves the
  Redis status on leader (re)start; FeedUpdater.createSubscriptions honors an
  explicit STOPPED so an enabled-but-stopped feed is not auto-resubscribed by a
  new leader (SubscriptionRegistry.isStopped / FeedUpdater.shouldSubscribe).
- Restart/stop divergence fix: restartSubscription removes only the in-memory id
  (removeSubscriptionId), never HDEL'ing the durable status; setup failures keep
  STARTING; maybeRetrySubscription guards the retry against reviving stopped
  feeds or creating duplicate pollers.
- Report freshness separately: new FeedFreshnessService (overdue logic extracted
  from MetricUpdater, which now delegates) exposes isLive/lastUpdated;
  PublicFeedProviderStatus gains dataFresh + lastUpdated. subscriptionStatus now
  means durable intent, dataFresh means actual data flow.
Surface the new dataFresh / lastUpdated fields from the status endpoint as
separate "Data Fresh" (Fresh/Stale chip) and "Last Updated" (relative time)
columns, so subscription status (durable intent) and actual data flow are
visible side by side.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.25000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.31%. Comparing base (d6d6695) to head (e7bf3f7).

Files with missing lines Patch % Lines
...rg/entur/lamassu/service/FeedFreshnessService.java 83.33% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #883      +/-   ##
============================================
+ Coverage     80.19%   80.31%   +0.11%     
- Complexity     1540     1566      +26     
============================================
  Files           206      207       +1     
  Lines          5741     5801      +60     
  Branches        377      388      +11     
============================================
+ Hits           4604     4659      +55     
- Misses          919      922       +3     
- Partials        218      220       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

createSubscription's setup-failure branch redundantly re-set STARTING on the
async retry path, racing with the synchronous STARTING set by the caller and
making updateSubscriptionStatus(STARTING) verify counts non-deterministic
(green locally, red in CI on testStartSubscription).

The re-set is unnecessary: every caller sets STARTING before createSubscription
and, since restart now keeps the durable status (removeSubscriptionId), nothing
clears it. Remove it so the failure path leaves the status untouched, and assert
that invariant (never removes / never marks STOPPED) instead.
@testower testower merged commit eef44f7 into master Jun 24, 2026
10 checks passed
@testower testower deleted the fix/subscription-status-durability branch June 24, 2026 10:49
assadriaz pushed a commit that referenced this pull request Jun 24, 2026
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant