fix: reclassify Apollo network-error log severity by TaprootFreak · Pull Request #58 · JuiceDollar/api

TaprootFreak · 2026-06-01T09:36:46Z

Summary

Mirror of d-EURO/api#117 per the
shared-codebase convention. Reclassify recoverable Apollo network errors
from error to warn, drop log payload fields the active formatter
silently discards, and refresh the fallback window after expiry.

Tracked in DFXServer/server#278.

Investigation

Sampled 2026-06-01 ~11:00 CEST on dfxprd (Loki):

Container	error/60m	error/24h
`juicedollar-jdm-api-11`	19	357
`juicedollar-jdt-api-11`	7	250

Both PRD containers run with LOG_LEVEL=warn (compose), so the existing
info-level breadcrumbs ([Ponder] Network error detected, activating fallback) were filtered out before they ever reached Loki. That's why
the ApiApolloConfig events show up as 100% error severity in the
dashboards even though the runtime path is the recoverable one.

GraphQL-errors path: 0/24h on dfxprd — confirms the noise is purely
from the network-error path.

Two latent issues fixed alongside (same as in d-EURO/api#117):

Metadata silently dropped. Winston formatter in api.main.ts uses
only info.message; the { message, name, stack } second arg never
shipped. Inlining networkError.message into the line keeps the
signal that was being lost.
activateFallback() only armed once per process lifetime. Guard
if (!fallbackUntil) stayed truthy after the first activation, so
the "Switching to fallback for 10min" log fired once per container
boot. Refreshed to re-arm when the window has passed.

Change

Byte-identical to the d-EURO sibling PR. Both apollo files are now in
sync (originally diverged by one line — the && CONFIG.indexerFallback
guard — which is now consolidated into both sides, benign in d-EURO
where indexerFallback has a config-level default).

Behaviour matrix

Scenario	severity	retries?
Primary fails, fallback configured & different URL, first time	`warn`	yes, via fallback
Primary fails, fallback configured & different URL, already on fallback	`error`	no (propagate)
Primary fails, no fallback URL configured (current jdm/jdt PRD)	`error`	no
GraphQL error	`error`	no (unchanged)

Behaviour change for the no-fallback case

The PRD compose for jdm/jdt does not set
CONFIG_INDEXER_FALLBACK_URL (intentional: there is no second indexer
deployment). With the previous code, network errors entered the recovery
branch anyway and called forward(operation) — a same-URL retry that
cannot recover anything meaningful, since both attempts hit the same
endpoint in the same JS tick. This PR drops that retry and propagates
the error to the caller instead.

Error counts in Loki stay roughly the same (~25/h combined) because
these are real failures, not noise being mis-classified.
Severity stays at error — the no-fallback branch logs logger.error,
not warn. That's the correct semantic: nothing to retry with means
it's a real failure for the client.
Clients see the error one round-trip earlier. The dapp and bots have
their own polling/retry cycles, so the user-visible effect is minimal.

Once a fallback indexer is provisioned (CONFIG_INDEXER_FALLBACK_URL
becomes a non-empty string different from CONFIG_INDEXER_URL), the
recovery branch lights up automatically: warn + URL switch + retry, just
like d-EURO PRD does today.

Expected post-deploy effect on dfxprd

ApiApolloConfig error-level lines: stays ~607/d combined
(jdm ~357, jdt ~250) — these are real failures that can't be hidden
behind a non-existent fallback. They retain visibility on the
error-rate panel.
One round-trip less per failure event (no useless same-URL retry).
Info-level [Ponder] Network error detected … breadcrumbs gone
entirely (they were filtered by LOG_LEVEL=warn anyway, so no
observable change).

A real noise reduction for JD requires either provisioning a separate
fallback indexer endpoint, or accepting these as real-failure signals
(my read).

Test plan

yarn build clean (verified locally on the branch HEAD)
yarn lint clean (verified locally on the branch HEAD)
npx prettier --check api.apollo.config.ts clean
After deploy to dfxprd: error-rate stays in the same band but
no longer doubles up with retry round-trips; CI logs from the
dapp/bots show no regression.

Drop noise from the recoverable-retry path: - Use logger.warn (not logger.error) when the primary indexer fails and the fallback is about to be engaged; reserve logger.error for cases where no fallback is configured or the fallback itself failed. - Drop the {message, name, stack} metadata payload — the Winston formatter in api.main.ts uses only info.message, so it never reached Loki anyway. Inline the message into the log line for actual signal. - Collapse the redundant info-level breadcrumbs ('Network error detected' / '503 Service Unavailable') into the single warn line. - Refresh the fallback window after expiry instead of arming it once per process lifetime, so a sustained outage keeps the fallback active. Behaviour change for the no-fallback case (jdm/jdt PRD have no CONFIG_INDEXER_FALLBACK_URL set): the same-URL retry via forward() is dropped because it cannot recover anything. Errors now propagate to the caller without an extra round-trip. Mirrors the parallel change in d-EURO/api per the shared codebase convention.

TaprootFreak mentioned this pull request Jun 1, 2026

fix: reclassify Apollo network-error log severity d-EURO/api#117

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reclassify Apollo network-error log severity#58

fix: reclassify Apollo network-error log severity#58
TaprootFreak wants to merge 1 commit into
developfrom
fix/apollo-network-error-log-severity

TaprootFreak commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TaprootFreak commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Investigation

Change

Behaviour matrix

Behaviour change for the no-fallback case

Expected post-deploy effect on dfxprd

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TaprootFreak commented Jun 1, 2026 •

edited

Loading