Skip to content

Reduce audible startup sync correction#255

Merged
balloob merged 2 commits into
Sendspin:mainfrom
trisweb:fix-audio-startup-warble
Jun 16, 2026
Merged

Reduce audible startup sync correction#255
balloob merged 2 commits into
Sendspin:mainfrom
trisweb:fix-audio-startup-warble

Conversation

@trisweb

@trisweb trisweb commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

This reduces audible pitch shift / warble during Sendspin client playback startup and other stream transitions.

The main issue appears to be that the client can start playback with a consistent initial sync offset, then correct that offset quite aggressively using sample insert/drop correction. On my Linux endpoint this was especially noticeable at the beginning of tracks. Some discussion of this occurred at #107, however it was indeterminate and difficult to pin down the cause.

With analysis and help from GPT5.5, with my full understanding and detailed review, this PR makes three related changes:

  • Fixes the drop-frame correction path so it discards one input frame and outputs the following frame, instead of repeating the previous output frame while consuming two input frames (bug fix)
  • Reduces the maximum correction rate from +/-4% to +/-0.2% (much more reliably below audible threshold)
  • Adds a short startup grace period before sync correction begins, so DAC/time-sync estimates can settle before the client starts inserting or dropping samples (most impactful improvement; basically correction was correcting issues that were not present simply due to lack of data)

Analysis

I was hearing pitch shift and warble on a fully up-to-date sendspin endpoint, most noticeably at playback start and during track changes.

Looking through sendspin/audio.py, the most suspicious path was the sync correction logic. The previous drop correction branch did this:

  1. Read one frame.
  2. Read another frame.
  3. Output the previous frame again.

That effectively produced a duplicate-then-skip pattern, which is more audible than a simple one-frame drop.

Separately, the correction loop allowed up to +/-4% playback speed correction over a 2 second target window. On real playback that is enough to sound like pitch movement, especially right after startup when the first sync estimate is still settling.

This was changed to a maximum +/-0.2% correction over an 8 second window, which is more conservative, but still within reasonable sync delay expectations (counting to 8 will help provide confidence; if we believe users would reasonably be OK with out-of-sync clients converging within 8 seconds, then this is a reasonable default). Importantly, this is well below the threshold of audible pitch shift or warble while still providing a means to converge.

In any case, if the sync is too far out, a reanchor will be triggered.

Additionally, a 750ms sync correction delay was added; this does not delay audible playback, it only suppresses sample insert/drop correction briefly after playback enters the PLAYING state. That gives the DAC timing and clock-sync estimates a short window to settle before the client starts making speed adjustments based on them. In practice this avoids reacting to the first unstable startup measurements while still allowing scheduled playback to begin on time.

Real Endpoint Comparison

I tested this on my actual Sendspin Linux endpoint using the same daemon config, audio device, and negotiated format:

  • Device: HiFiBerry DAC+
  • Format: flac:48000:24:2
  • Server: Music Assistant
  • Output latency reported by PortAudio: ~42.7 ms

The original version repeatedly started streams around -42 ms sync error, then corrected aggressively.

Original observed debug stats:

  • Underflows: 0
  • Reanchors: 0
  • Warnings/errors: 0
  • Speed range: 98.29% to 100.11%
  • Max inserted frames per 1s log window: 821
  • Max dropped frames per 1s log window: 55

Patched observed debug stats:

  • Underflows: 0
  • Reanchors: 0
  • Warnings/errors: 0
  • Speed range: 99.78% to 100.04%
  • Max inserted frames per 1s log window: 106
  • Max dropped frames per 1s log window: 21

The startup offset is still visible, but correction is much less aggressive and avoids the previous drop-frame artifact.

Tests

Added focused regression tests for:

  • Drop correction discarding one frame without repeating the previous output frame.
  • Startup correction grace period suppressing immediate insert/drop correction.

Local verification:

uv run --extra test ruff check sendspin/audio.py tests/test_audio.py
uv run --extra test mypy sendspin
uv run --extra test pytest

Results:

All checks passed
Success: no issues found in 30 source files
91 passed

Testing for a day in real world scenarios has also been very successful.

@balloob

balloob commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

We should use same value as sendspin cpp

@trisweb

trisweb commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

I took a look at sendspin-cpp. Pardon if this is known already but I figured I would share my findings.

The relevant functionality is around sync_task.cpp:236. The implementation seems fairly different around how the stream is synchronized at startup and during playback; beyond 5000us initially or 500us during playback, it will "hard sync" by dropping large blocks of frames, and below that level it will "soft sync" by adding or removing up to one frame per decoded block. For soft sync, at 48kHz this amounts to roughly 0.1% pitch distortion during the period of time it may take to sync (as long as the difference remains below the threshold).

The implementation in sendspin-cli is materially different, though achieves a similar outcome; however before this patch the equivalent "soft sync" correction could warp the audio by up to 4% in its effort to bring the stream in sync in under 2 seconds.

My patch here makes sendspin-cli significantly more in line with sendspin-cpp, despite the implementation difference. The pitch distortion targets a maximum of around 0.2%, which despite being double what sendspin-cpp targets, is still well below the audible range. If anything I'd say we could allow sendspin-cpp to add or remove up to two frames per block to sync more quickly while still maintaining inaudible status, but I think it's fine as-is practically.

One option is to make sendspin-cli much more aligned with the implementation in sendspin-cpp, which would be a significant effort of again rewriting the core.

Rather than this, based on my analysis I'd go with my patch to align the general behavior between the two. I believe it does so in the simplest way possible without massively changing the implementation in sendspin-cli.

@balloob

balloob commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

All our implementations should act the same. I think that we should decide which value (1 or 2 frames per block) we prefer, and when we should do hard syncs etc, and make sure it's done in all our implementations.

@maximmaxim345

Copy link
Copy Markdown
Member

From an end-user (audible) perspective, after this PR plus interpolating the corrected frames (cpp blends adjacent samples on insert/drop, cli currently duplicates/skips a raw frame), I think the two approaches are practically indistinguishable.

That said, I don't think we should force every implementation onto one exact strategy. Each has real strengths:

  • cpp/js is leaner: (in-place corrections, simpler/less memory copying). The cli/python approach isn't even viable in sendspin-js on Google Cast for Audio due to the resource constraints there.
  • cli handles high sample rates better: High sample rates mean every sample is shorter. Since its correction rate isn't limited to one frame per decoded chunk switching output formats doesn't impact the maximum correction rate.

On 1 vs 2 frames per block: in sendspin-js I tried bumping it to 2 and I could hear more distortion than with 1, so I'd keep it at 1 there. I did not compare dropping 2 samples at once vs distributed throughout the chunk (like cli) though which would be a much fairer comparison.

@trisweb

trisweb commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I agree with @maximmaxim345, the implementations are appropriate for each language, and they end up implementing the same general approach. I don't think making them precisely the same is a desirable goal as each has different strengths and constraints.

This PR aligns them with each other behaviorally, while allowing each implementation to do what it's good at; CLI to Linux support and sample rates, and cpp to embedded and resource constrained environments. Best of both worlds I think.

@balloob balloob merged commit 22b9ff3 into Sendspin:main Jun 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants