Skip to content

Audio EOT#4722

Open
chenghao-mou wants to merge 62 commits into
mainfrom
feat/AGT-2520-multimodal-EOU
Open

Audio EOT#4722
chenghao-mou wants to merge 62 commits into
mainfrom
feat/AGT-2520-multimodal-EOU

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou commented Feb 5, 2026

Requires livekit/protocol#1485

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector in livekit-plugins-turn-detector selects between two backends:

  • eot-audio-cloud
  • eot-audio-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to local for the rest of the stream (sticky per session, one warning per failure mode). Local failures emit the default 1.0 prediction and retry on the next turn. A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

Wired into AudioRecognition: VAD INFERENCE_DONE triggers warmup, END_OF_SPEECH activates the stream, predictions flow back through _run_eou_detection and arbitrate against the endpointing delay. A speaking guard cancels an in-flight bounce if VAD START_OF_SPEECH fires mid-window.

Structure

  • livekit/agents/voice/turn.py — abstract _AudioTurnDetector / _AudioTurnDetectorStream (FSM) live alongside the existing _TurnDetector Protocol.
  • livekit/plugins/turn_detector/audio.py — unified detector + concrete FSM stream that dispatches to the active transport.
  • livekit/plugins/turn_detector/transports.pyAudioTurnDetectionTransport Protocol + _CloudTransport (WS + protobuf) + _LocalTransport (ctypes). Fallback swaps the transport instance, not the stream.
  • livekit/plugins/turn_detector/languages.pyCLOUD_LANGUAGES (0.4) + LOCAL_LANGUAGES (0.3) per-language thresholds.

Test plan

  • tests/test_turn_detection_fsm.py — 11 FSM cases incl. WARMING_UP / set_active(False) regression.
  • tests/test_turn_detection_cloud_stream.py — 4 cloud-transport invariants (retry reset, FIFO send ordering).
  • tests/test_audio_turn_detector_fallback.py — 15 cases: auto-select, explicit-mode errors, transport-error fallback, timeout fallback, persistence, missing-lib graceful, local-failure retry, warning dedupe, multiplicative threshold scaling.
  • tests/test_audio_recognition_turn_detection.py — 10 cases: VAD/audio/sentinel forwarding into the stream, prediction-driven EOU + deactivation, speaking-guard race aborts commit.
  • make format + make lint + make type-check clean (only pre-existing agent_activity.py InterruptionDetectionError errors remain).

Depending on livekit/python-sdks#676

@hsjun99
Copy link
Copy Markdown

hsjun99 commented Feb 25, 2026

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

@chenghao-mou
Copy link
Copy Markdown
Member Author

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

@chenghao-mou chenghao-mou marked this pull request as ready for review April 22, 2026 07:38
@chenghao-mou chenghao-mou requested a review from a team April 22, 2026 07:38
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@chenghao-mou chenghao-mou changed the title Multimodal EOU Audio EOT May 17, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants