fix(gateway): self-heal upstream after backend pod rollout (v1.0.11) by ZhiXiao-Lin · Pull Request #7 · A3S-Lab/Gateway

ZhiXiao-Lin · 2026-06-22T08:57:25Z

Headline: hyper's idle connection pool is keyed by hostname (not resolved IP) with a 90s idle timeout > 30s passive-health recovery_time. After a backend pod rolls (new IP), pooled sockets to the dead pod IP get reused → SendRequest fails → backend marked unhealthy → half-open probe reuses another stale socket → permanent 503 until gateway restart. Fix: pool_idle_timeout 90→5s, TCP keepalive 90→15s, recovery_time 30→10s (5s<10s invariant → probe always re-resolves to the new pod IP → self-heals ~10s). Also commits the route-key de-dup (live in prod, never committed). Bumps to v1.0.11. cargo check clean.

…service name image-app-publish names the Ingress and Service identically, so the ns-ingress-svc key doubled (default-arche-arche). Already live in the deployed gateway — committing so the released build matches production.

Root cause: hyper's idle connection pool is keyed by hostname, not resolved IP, with a 90s idle timeout > 30s passive-health recovery_time. After a backend Deployment rolls (new pod IP), pooled sockets to the dead old pod IP linger and get reused -> SendRequest fails -> backend marked unhealthy -> the half-open probe reuses another stale socket -> permanent 503 'No healthy backends' until the gateway is restarted. Fix: pool_idle_timeout 90s->5s + TCP keepalive 90s->15s (evict stale sockets before the recovery probe) and passive-health recovery_time 30s->10s. The 5s<10s invariant guarantees the half-open probe opens a FRESH connection that re-resolves DNS to the new pod IP, so the gateway self-heals within ~10s of a rollout instead of needing a manual restart.

RoyLin added 2 commits June 22, 2026 16:56

ZhiXiao-Lin merged commit 9904829 into main Jun 22, 2026
1 of 2 checks passed

ZhiXiao-Lin deleted the fix/upstream-rollout-selfheal branch June 22, 2026 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gateway): self-heal upstream after backend pod rollout (v1.0.11)#7

fix(gateway): self-heal upstream after backend pod rollout (v1.0.11)#7
ZhiXiao-Lin merged 2 commits into
mainfrom
fix/upstream-rollout-selfheal

ZhiXiao-Lin commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZhiXiao-Lin commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant