Audio EOT#4722
Open
chenghao-mou wants to merge 62 commits into
Open
Conversation
|
@chenghao-mou Excited to see this! A couple of questions:
|
Member
Author
Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Requires livekit/protocol#1485
Adds streaming audio end-of-turn detection. Single user-facing
AudioTurnDetectorinlivekit-plugins-turn-detectorselects between two backends:eot-audio-cloudeot-audio-miniOn cloud transport error or
predict_end_of_turntimeout, the session swaps to local for the rest of the stream (sticky per session, one warning per failure mode). Local failures emit the default1.0prediction and retry on the next turn. A user-setunlikely_thresholdis scaled multiplicatively against the cloud default so the operating point survives a fallback.Wired into
AudioRecognition: VADINFERENCE_DONEtriggerswarmup,END_OF_SPEECHactivates the stream, predictions flow back through_run_eou_detectionand arbitrate against the endpointing delay. A speaking guard cancels an in-flight bounce if VADSTART_OF_SPEECHfires mid-window.Structure
livekit/agents/voice/turn.py— abstract_AudioTurnDetector/_AudioTurnDetectorStream(FSM) live alongside the existing_TurnDetectorProtocol.livekit/plugins/turn_detector/audio.py— unified detector + concrete FSM stream that dispatches to the active transport.livekit/plugins/turn_detector/transports.py—AudioTurnDetectionTransportProtocol +_CloudTransport(WS + protobuf) +_LocalTransport(ctypes). Fallback swaps the transport instance, not the stream.livekit/plugins/turn_detector/languages.py—CLOUD_LANGUAGES(0.4) +LOCAL_LANGUAGES(0.3) per-language thresholds.Test plan
tests/test_turn_detection_fsm.py— 11 FSM cases incl. WARMING_UP /set_active(False)regression.tests/test_turn_detection_cloud_stream.py— 4 cloud-transport invariants (retry reset, FIFO send ordering).tests/test_audio_turn_detector_fallback.py— 15 cases: auto-select, explicit-mode errors, transport-error fallback, timeout fallback, persistence, missing-lib graceful, local-failure retry, warning dedupe, multiplicative threshold scaling.tests/test_audio_recognition_turn_detection.py— 10 cases: VAD/audio/sentinel forwarding into the stream, prediction-driven EOU + deactivation, speaking-guard race aborts commit.make format+make lint+make type-checkclean (only pre-existingagent_activity.pyInterruptionDetectionError errors remain).Depending on livekit/python-sdks#676