Skip to content

Add wait_golden_jitter: deterministic low-discrepancy jitter strategy#642

Open
F4V3L4 wants to merge 3 commits into
jd:mainfrom
F4V3L4:add-golden-jitter-wait
Open

Add wait_golden_jitter: deterministic low-discrepancy jitter strategy#642
F4V3L4 wants to merge 3 commits into
jd:mainfrom
F4V3L4:add-golden-jitter-wait

Conversation

@F4V3L4

@F4V3L4 F4V3L4 commented Jun 15, 2026

Copy link
Copy Markdown

Add wait_golden_jitter: deterministic low-discrepancy jitter strategy

Summary

This PR adds a new wait strategy, wait_golden_jitter, providing a
deterministic low-discrepancy alternative to the existing random jitter
strategies (wait_random_exponential / "Full Jitter").

It keeps the same exponentially expanding backoff window as
wait_random_exponential, but selects each waiter's delay from the
golden-ratio low-discrepancy sequence (indexed by a caller-supplied
seq_index) instead of drawing it at random.

Motivation

The existing jitter strategies break retry synchronization using randomness.
Random draws are statistically "clumpy": by chance, several waiters land in
the same time window, producing concurrency peaks larger than necessary. The
golden ratio is the most irrational number, so its additive recurrence
distributes points across the window as evenly as possible (Weyl
equidistribution / three-distance theorem).

This offers three properties the random strategies cannot:

  1. Bounded worst-case spread — the peak concurrency across N uncoordinated
    waiters is controlled, not luck-dependent.
  2. Reproducibility — identical runs produce identical schedules, which
    makes retry behavior testable and debuggable (random jitter is not).
  3. No RNG required — useful where a stable per-waiter index is available
    (host id, shard, worker number) but a strong RNG is not (embedded/IoT).

This is complementary to the random strategies, not a replacement. Where
callers cannot provide distinct indices, the random strategies remain the
right default.

Benchmark

Thundering-herd simulation, peak concurrency (lower is better), same
exponential window for both, 300 trials:

Scenario Full Jitter (random) worst peak Golden Jitter worst peak Worst-case reduction
1000 clients / 50 slots 45 (2.25× ideal) 21 (1.05× ideal) 53%
10000 clients / 100 slots 137 (1.37× ideal) 102 (1.02× ideal) 26%
10000 clients / 200 slots 82 (1.64× ideal) 51 (1.02× ideal) 38%

Mean peak is similar between the two; the gain is concentrated in the
worst case and in predictability — which is what reliability
engineering sizes for.

Honest caveats

  • The advantage is worst-case / tail behavior and reproducibility, not a
    large average-case win.
  • It requires distinct seq_index values across waiters; with identical
    indices it degenerates to a fixed schedule (documented in the docstring).
  • The benchmark is an idealized model; real-world clock skew and network
    latency would shrink the gap. Reproducibility and no-RNG remain regardless.

Changes

  • tenacity/wait.py: add wait_golden_jitter class.
  • tests/test_tenacity.py: add TestWaitGoldenJitter (6 tests).
  • doc/source/index.rst: document the new strategy and when to prefer it.

Prior art

The use of the golden ratio for even, hard-to-resonate spacing is
well-established in low-discrepancy sampling. A formal optimality result for
golden-ratio scheduling in a related setting appears in Kempe, Schulman &
Tamuz, "Quasi-regular sequences and optimal schedules for security games"
(SODA 2018). This PR applies the same principle to retry timing.

F4V3L4 and others added 3 commits June 15, 2026 18:28
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A002 is not enabled in the project's ruff config, so the
`# noqa: A002` suppressions on `min` and `max` parameters were
reported as unused by `ruff check`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The tests for low-discrepancy and distinct-phase properties accessed
`_phase` directly. Since `phase` is a meaningful observable value
(the golden-ratio offset for a given seq_index), exposing it publicly
is cleaner than suppressing the SLF001 lint rule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant