Skip to content

feat(utils): retry transient upstream errors in rest_query/post_query#261

Open
Elarwei001 wants to merge 2 commits into
scverse:devfrom
Elarwei001:ensembl-query-retry
Open

feat(utils): retry transient upstream errors in rest_query/post_query#261
Elarwei001 wants to merge 2 commits into
scverse:devfrom
Elarwei001:ensembl-query-retry

Conversation

@Elarwei001

Copy link
Copy Markdown
Contributor

Scope

rest_query / post_query in gget/utils.py are the shared entry points for every Ensembl-backed module (seq, ref, search, info, blat, and enrichr with ensembl=True). They currently issue a single request, so any transient hiccup from the upstream service fails the whole call. Ensembl REST in particular returns intermittent 500/502/503/504 or 429 under load, and occasionally drops the connection — which shows up as flaky failures in CI and for users.

Change

Add a small _query_with_retry helper and route both functions through it:

  • Retries on 429 and 5xx (500/502/503/504) and on connection/timeout errors, with linear backoff (2s, 4s, 6s; 4 attempts total).
  • Honors Retry-After when the server sends one (capped at 30s).
  • Does not retry 4xx client errors (e.g. 404) — returned to the caller unchanged, so existing error handling (if not r.ok: raise ...) is preserved.
  • After exhausting retries, returns the last response (or re-raises the last connection error), so callers behave exactly as before in the terminal-failure case.

No public API or return types change — this only adds resilience underneath the existing functions.

Tests

Offline unit tests (tests/test_utils.py::TestQueryRetry, mocked requests/sleep — no network) cover: success after retry, connection-error retry, no-retry on 404, retry exhaustion, and Retry-After handling.

Notes

utils.py already has an http_json retry helper, but it always parses JSON and raises on non-JSON bodies. rest_query can return either text or JSON and needs the raw Response, so this helper returns the Response object and leaves body handling to the existing call sites.

Ensembl REST and other upstream services occasionally return transient
5xx (500/502/503/504) or 429 responses, or drop the connection under
load. Every Ensembl-backed module (seq, ref, search, info, blat, and
enrichr with ensembl=True) routes through rest_query/post_query, so a
single transient blip currently fails the whole call.

Add a small _query_with_retry helper and route both functions through
it:
- Retries on 429/5xx and on connection/timeout errors, with linear
  backoff (2s, 4s, 6s).
- Honors a Retry-After header when the server sends one (capped at 30s).
- Does NOT retry 4xx client errors (e.g. 404) — those are returned to
  the caller unchanged, preserving existing error handling.
- After exhausting retries, returns the last response (or re-raises the
  last connection error) so callers behave exactly as before.

Add offline unit tests (TestQueryRetry) covering success-after-retry,
connection-error retry, no-retry on 404, retry exhaustion, and
Retry-After handling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Elarwei001

Copy link
Copy Markdown
Contributor Author

CI keeps fail on external service's 5xx error, add retry to increase robustness.
image

@codecov-commenter

codecov-commenter commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.66%. Comparing base (e43a804) to head (85d2124).
⚠️ Report is 2 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #261      +/-   ##
==========================================
- Coverage   56.70%   56.66%   -0.05%     
==========================================
  Files          29       29              
  Lines        9392     9417      +25     
==========================================
+ Hits         5326     5336      +10     
- Misses       4066     4081      +15     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add `--durations=25` to pytest addopts so every CI run prints the slowest
tests at the end of the log. This makes it easy to spot which (mostly
network-bound) tests dominate CI wall-clock time, complementing the
request-retry changes in this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Elarwei001 Elarwei001 changed the title Retry transient upstream errors in rest_query/post_query feat(utils): retry transient upstream errors in rest_query/post_query Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants