Define minimum sync standards for the `player@v1` role by maximmaxim345 · Pull Request #104 · Sendspin/spec

maximmaxim345 · 2026-06-16T11:52:29Z

The specification didn't exactly define a minimum bar for synced playback. That meant that terribly out-of-sync players were still valid, while their end-user experience was unusable when grouped with other Sendspin players.

The new rules:

require the use of time-filter and the bursting strategy
client/time cadence floor
inaudible corrections in steady state
max ±0.5% speed in steady state (sliding average over the maximum chunk size, 150 ms)
±2 ms accuracy in steady state
a rare one-shot resync (startup, underrun, large disturbance) is exempt from the speed and accuracy bounds
no startup warble
server chunk duration bounded to 15-150 ms (covers Opus at 20 ms and FLAC at 105 ms)

To give a head start for new implementations, this PR also adds a simple suggested strategy: discrete, bit-exact sample deletion and insertion. The player drops or duplicates whole frames to correct drift, leaving the audio untouched except at the moments it corrects. N scales with sample rate. Other algorithms like ASRC are still encouraged.

Constants

I'm still not sure what exact values we should pick. The values I picked in this draft are:

Maximum speed deviation ±0.5%: tighter than the cli's old ±4% warble; looser than what cpp/cli hold today (~0.1%/~0.2%). This is a ~8.6 cent pitch shift, on the edge of being inaudible with music. In steady state pitch tracks clock drift, so this cap is rarely reached.
Accuracy floor ±2 ms: achievable continuously by native clients. Might be difficult for some implementations like sendspin-js.
Accuracy target ±1 ms: in-room target, enough so individual speakers are not discernible when grouped.
Chunk duration bounds 15-150 ms: the 150 ms cap gives headroom over aiosendspin's current 105 ms max (FLAC at 44.1 kHz). The 15 ms floor keeps enough samples per chunk to correct within the ±0.5% cap.
Soft correction baseline ≈21 µs per chunk: one frame at 48 kHz, scaled by sample rate. Implementations may use a larger step to keep up with drift, bounded by the ±0.5% cap.
Dead band 100 µs: matches sendspin-cpp's existing value.
One-shot resync threshold 2 ms: set equal to the accuracy floor so the snap fires before the floor is violated.

Related issues

Make the correction-quality rule outcome-based (Inaudible corrections) and exempt a rare one-shot resync from both the speed cap and the accuracy floor. The speed cap is now a sliding average over 150 ms.

Interpolation only very slightly decreased distortion, lets drop it to keep the Spec simpler. Other (better) strategies like ASRC are encouraged.

With the proposed minimum sync standards in Sendspin/spec#104 the default `sync` mode in `sendspin-js` was no longer spec compliant. This PR tweaks them to follow the spec. There were two problems before this PR: - Drift was corrected with up to ±2% playback-rate changes and was therefore audible - And startup errors could sit at 100-150 ms for tens of seconds because a resync re-anchored its backlog forward instead of dropping it.

maximmaxim345 · 2026-06-18T19:50:21Z

The limits/constants even work for tricky platforms like sendspin-js running on Chromium (which randomizes the clock for security AFAIK). It still held within about +-1 ms there.

This works because we define accuracy from the Kalman filter to the output, not end-to-end. And since we also define how the filter has to be implemented and used, this essentially factors out the things we can't control, like network delay and stability.

So even listening through a VPN across the world, it might not be perfectly in sync, but the implementation is still spec compliant, since it's doing as well as it can.

kahrendt · 2026-06-22T14:18:45Z

+
+Each client is responsible for maintaining its own synchronization with the server's timestamps.
+
+- **Accuracy floor:** In steady state, implementations MUST keep this error within ±2 ms. The only exception is the one-shot resynchronization exempted from the speed cap above, which MUST be rare.


2 ms seems like a too high bound. Or are we limited in JS to consistently getting less than 1 ms?

sendspin-js shouldn't be limiting the spec so we can lower it.
What about ±1ms? I wouldn't go lower without more testing though. From some initial tests ±0.5ms looks difficult to achieve with a USB DAC playing with sendspin-cli.

kahrendt · 2026-06-22T14:24:46Z

+Each client is responsible for maintaining its own synchronization with the server's timestamps.
+
+- **Accuracy floor:** In steady state, implementations MUST keep this error within ±2 ms. The only exception is the one-shot resynchronization exempted from the speed cap above, which MUST be rare.
+- **Accuracy target:** Implementations SHOULD aim for ±1 ms.


Similarly, are we just limited by the JS implementation from being stricter here?

How about ±0.5ms? It's a recommendation so we could technically go as low as we want. It what implementations target after all, if they run on slow hardware it just might be slightly above the threshold (but still compliant).

kahrendt · 2026-06-22T16:45:37Z

+
+- **Chunk duration bounds:** A server MUST NOT send an audio chunk longer than 150 ms, and SHOULD NOT send one shorter than 15 ms (the final chunk of a stream or the chunk before a format change MAY be shorter).
+- The server sends audio to late-joining clients with future timestamps only, allowing them to buffer and start playback in sync with existing clients.
+- After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers must schedule the first audio timestamp far enough in the future to satisfy each player's [`required_lead_time_ms`](#client--server-clientstate-player-object) (startup warmup) and [`min_buffer_ms`](#client--server-clientstate-player-object) (ongoing jitter buffer). For live streams the buffer cannot grow after playback begins, so the larger of the two must already be reached before the first chunk plays.


For a live stream, why would we take the larger of the two (not exactly this PR but a question that's been bugging me as I implement the player timing changes!)?

Presumably, on a live stream, you don't care about missing the first chunk of audio because... it's live, you've already missed the ongoing stuff. The network jitter is really the only one to care about in that scenario; i.e., how much buffer do I need to avoid audio drops?

required_lead_time isnt strictly about the missing content, its about how much time should happen between the stream/start and the timestamp in the first chunk. So to have required_lead and min_buffer work consistently for the player (no matter if its live or not), taking the larger of the two should make it work without having a buffer.

Buffered can of course ignore the min_buffer since they can just fetch more audio data. But with live streams only using required_lead or only using min_buffer would cause it to break one way or the other. If you only use required_lead and its smaller than the min_buffer, the buffer starts too shallow and since live cant grow it after playback starts you get dropouts the first time the network jitters. And if you only use min_buffer and its smaller than required_lead, then the first chunk lands less than required_lead after stream/start, which contradicts the spec since thats exactly the gap its supposed to guarantee.

Then why do we have the two different parameters if we just take the max? If a client has a slow startup but rock solid network, I thin it would more useful to use just the network jitter measurement, even if that means the client needs to throw away a few chunks at the start (which is a violation of the spec as written, to be fair).

Let's see if I'm understanding by phrasing it a bit differently. If it isn't a live stream, then only required_lead is honored. If it is a live stream, then we take the max of required_lead and min_buffer. Which just effectively means the min_buffer parameter only matters if min_buffer > required_lead in the live case. Do I have that breakdown right for how the server behaves? I feel like we are missing an opportunity for the other case.

We can move this to Discord or a separate discussion so I don't keep polluting this PR!

Define minimum standards for synchronized playback

5901e82

maximmaxim345 mentioned this pull request Jun 16, 2026

Replace client error state with not_synchronized #102

Open

maximmaxim345 added 3 commits June 17, 2026 09:39

Scope playback-sync requirements to steady state

e49426a

Make the correction-quality rule outcome-based (Inaudible corrections) and exempt a rare one-shot resync from both the speed cap and the accuracy floor. The speed cap is now a sliding average over 150 ms.

Set a minimum audio chunk duration of 15 ms

3f0b9b4

Replace the interpolated push model with frame drop/insert

0dc5b21

Interpolation only very slightly decreased distortion, lets drop it to keep the Spec simpler. Other (better) strategies like ASRC are encouraged.

maximmaxim345 marked this pull request as ready for review June 17, 2026 07:53

maximmaxim345 mentioned this pull request Jun 18, 2026

Make default correction limits inaudible Sendspin/sendspin-js#137

Merged

kahrendt reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define minimum sync standards for the `player@v1` role#104

Define minimum sync standards for the `player@v1` role#104
maximmaxim345 wants to merge 4 commits into
mainfrom
feat/define-minimum-player-syncing-standards

maximmaxim345 commented Jun 16, 2026 •

edited

Loading

Uh oh!

maximmaxim345 commented Jun 18, 2026

Uh oh!

kahrendt Jun 22, 2026

Uh oh!

maximmaxim345 Jun 24, 2026

Uh oh!

kahrendt Jun 22, 2026

Uh oh!

maximmaxim345 Jun 24, 2026 •

edited

Loading

Uh oh!

kahrendt Jun 22, 2026

Uh oh!

maximmaxim345 Jun 24, 2026

Uh oh!

kahrendt Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Each client is responsible for maintaining its own synchronization with the server's timestamps.

		- Accuracy floor: In steady state, implementations MUST keep this error within ±2 ms. The only exception is the one-shot resynchronization exempted from the speed cap above, which MUST be rare.

Uh oh!

Conversation

maximmaxim345 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Constants

Related issues

Uh oh!

maximmaxim345 commented Jun 18, 2026

Uh oh!

kahrendt Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

maximmaxim345 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

kahrendt Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

maximmaxim345 Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kahrendt Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

maximmaxim345 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

kahrendt Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maximmaxim345 commented Jun 16, 2026 •

edited

Loading

maximmaxim345 Jun 24, 2026 •

edited

Loading