Define minimum sync standards for the player@v1 role#104
Conversation
Make the correction-quality rule outcome-based (Inaudible corrections) and exempt a rare one-shot resync from both the speed cap and the accuracy floor. The speed cap is now a sliding average over 150 ms.
Interpolation only very slightly decreased distortion, lets drop it to keep the Spec simpler. Other (better) strategies like ASRC are encouraged.
With the proposed minimum sync standards in Sendspin/spec#104 the default `sync` mode in `sendspin-js` was no longer spec compliant. This PR tweaks them to follow the spec. There were two problems before this PR: - Drift was corrected with up to ±2% playback-rate changes and was therefore audible - And startup errors could sit at 100-150 ms for tens of seconds because a resync re-anchored its backlog forward instead of dropping it.
|
The limits/constants even work for tricky platforms like This works because we define accuracy from the Kalman filter to the output, not end-to-end. And since we also define how the filter has to be implemented and used, this essentially factors out the things we can't control, like network delay and stability. So even listening through a VPN across the world, it might not be perfectly in sync, but the implementation is still spec compliant, since it's doing as well as it can. |
|
|
||
| Each client is responsible for maintaining its own synchronization with the server's timestamps. | ||
|
|
||
| - **Accuracy floor:** In steady state, implementations MUST keep this error within ±2 ms. The only exception is the one-shot resynchronization exempted from the speed cap above, which MUST be rare. |
There was a problem hiding this comment.
2 ms seems like a too high bound. Or are we limited in JS to consistently getting less than 1 ms?
There was a problem hiding this comment.
sendspin-js shouldn't be limiting the spec so we can lower it.
What about ±1ms? I wouldn't go lower without more testing though. From some initial tests ±0.5ms looks difficult to achieve with a USB DAC playing with sendspin-cli.
| Each client is responsible for maintaining its own synchronization with the server's timestamps. | ||
|
|
||
| - **Accuracy floor:** In steady state, implementations MUST keep this error within ±2 ms. The only exception is the one-shot resynchronization exempted from the speed cap above, which MUST be rare. | ||
| - **Accuracy target:** Implementations SHOULD aim for ±1 ms. |
There was a problem hiding this comment.
Similarly, are we just limited by the JS implementation from being stricter here?
There was a problem hiding this comment.
How about ±0.5ms? It's a recommendation so we could technically go as low as we want. It what implementations target after all, if they run on slow hardware it just might be slightly above the threshold (but still compliant).
|
|
||
| - **Chunk duration bounds:** A server MUST NOT send an audio chunk longer than 150 ms, and SHOULD NOT send one shorter than 15 ms (the final chunk of a stream or the chunk before a format change MAY be shorter). | ||
| - The server sends audio to late-joining clients with future timestamps only, allowing them to buffer and start playback in sync with existing clients. | ||
| - After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers must schedule the first audio timestamp far enough in the future to satisfy each player's [`required_lead_time_ms`](#client--server-clientstate-player-object) (startup warmup) and [`min_buffer_ms`](#client--server-clientstate-player-object) (ongoing jitter buffer). For live streams the buffer cannot grow after playback begins, so the larger of the two must already be reached before the first chunk plays. |
There was a problem hiding this comment.
For a live stream, why would we take the larger of the two (not exactly this PR but a question that's been bugging me as I implement the player timing changes!)?
Presumably, on a live stream, you don't care about missing the first chunk of audio because... it's live, you've already missed the ongoing stuff. The network jitter is really the only one to care about in that scenario; i.e., how much buffer do I need to avoid audio drops?
There was a problem hiding this comment.
required_lead_time isnt strictly about the missing content, its about how much time should happen between the stream/start and the timestamp in the first chunk. So to have required_lead and min_buffer work consistently for the player (no matter if its live or not), taking the larger of the two should make it work without having a buffer.
Buffered can of course ignore the min_buffer since they can just fetch more audio data. But with live streams only using required_lead or only using min_buffer would cause it to break one way or the other. If you only use required_lead and its smaller than the min_buffer, the buffer starts too shallow and since live cant grow it after playback starts you get dropouts the first time the network jitters. And if you only use min_buffer and its smaller than required_lead, then the first chunk lands less than required_lead after stream/start, which contradicts the spec since thats exactly the gap its supposed to guarantee.
There was a problem hiding this comment.
Then why do we have the two different parameters if we just take the max? If a client has a slow startup but rock solid network, I thin it would more useful to use just the network jitter measurement, even if that means the client needs to throw away a few chunks at the start (which is a violation of the spec as written, to be fair).
Let's see if I'm understanding by phrasing it a bit differently. If it isn't a live stream, then only required_lead is honored. If it is a live stream, then we take the max of required_lead and min_buffer. Which just effectively means the min_buffer parameter only matters if min_buffer > required_lead in the live case. Do I have that breakdown right for how the server behaves? I feel like we are missing an opportunity for the other case.
We can move this to Discord or a separate discussion so I don't keep polluting this PR!
The specification didn't exactly define a minimum bar for synced playback. That meant that terribly out-of-sync players were still valid, while their end-user experience was unusable when grouped with other Sendspin players.
The new rules:
client/timecadence floorTo give a head start for new implementations, this PR also adds a simple suggested strategy: discrete, bit-exact sample deletion and insertion. The player drops or duplicates whole frames to correct drift, leaving the audio untouched except at the moments it corrects. N scales with sample rate. Other algorithms like ASRC are still encouraged.
Constants
I'm still not sure what exact values we should pick. The values I picked in this draft are:
sendspin-js.aiosendspin's current 105 ms max (FLAC at 44.1 kHz). The 15 ms floor keeps enough samples per chunk to correct within the ±0.5% cap.Related issues