Audio EOT by chenghao-mou · Pull Request #4722 · livekit/agents

chenghao-mou · 2026-02-05T15:09:40Z

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector in livekit-plugins-turn-detector selects between two backends:

eot-audio-cloud
eot-audio-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to local for the rest of the stream (sticky per session, one warning per failure mode). Local failures emit the default 1.0 prediction and retry on the next turn. A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

Wired into AudioRecognition: VAD INFERENCE_DONE triggers warmup, END_OF_SPEECH activates the stream, predictions flow back through _run_eou_detection and arbitrate against the endpointing delay. A speaking guard cancels an in-flight bounce if VAD START_OF_SPEECH fires mid-window.

Structure

livekit/agents/voice/turn.py — abstract _AudioTurnDetector / _AudioTurnDetectorStream (FSM) live alongside the existing _TurnDetector Protocol.
livekit/plugins/turn_detector/audio.py — unified detector + concrete FSM stream that dispatches to the active transport.
livekit/plugins/turn_detector/transports.py — AudioTurnDetectionTransport Protocol + _CloudTransport (WS + protobuf) + _LocalTransport (ctypes). Fallback swaps the transport instance, not the stream.
livekit/plugins/turn_detector/languages.py — CLOUD_LANGUAGES (0.4) + LOCAL_LANGUAGES (0.3) per-language thresholds.

Test plan

tests/test_turn_detection_fsm.py — 11 FSM cases incl. WARMING_UP / set_active(False) regression.
tests/test_turn_detection_cloud_stream.py — 4 cloud-transport invariants (retry reset, FIFO send ordering).
tests/test_audio_turn_detector_fallback.py — 15 cases: auto-select, explicit-mode errors, transport-error fallback, timeout fallback, persistence, missing-lib graceful, local-failure retry, warning dedupe, multiplicative threshold scaling.
tests/test_audio_recognition_turn_detection.py — 10 cases: VAD/audio/sentinel forwarding into the stream, prediction-driven EOU + deactivation, speaking-guard race aborts commit.
make format + make lint + make type-check clean (only pre-existing agent_activity.py InterruptionDetectionError errors remain).

Depending on livekit/python-sdks#676

hsjun99 · 2026-02-25T01:00:31Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
Any rough timeline for when MultiModalTurnDetector gets fully wired up?

chenghao-mou · 2026-02-25T10:07:07Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?

Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

add interface draft

87068d5

chenghao-mou added 25 commits March 6, 2026 10:47

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

e0d5ec1

draft

8eebccc

fix type issues

f92fbc0

refactor stream to support turn detector protocol

d1086ff

minor fixes

0a02bb1

minor fixes

168d0d7

WIP: use only ws stream

277db6e

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

03c0e2e

fix uv.lock bad merge

56b4796

WIP: more refactoring

be9a550

fix mypy

601229c

remove temp url

c4d92f8

disable turn detection when agent is still speaking

e963d85

minor refactoring

c529d79

fix type issues

09baed8

wip

3830638

clean up encoder

f214aa0

wip

c922f44

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

f94a0dd

update protos

604bfdc

minor fixes

f9ec64a

address comments

ddbf594

add text fallback

d465564

add text fallback

6e7d6bf

fix threshold

200d634

chenghao-mou marked this pull request as ready for review April 22, 2026 07:38

chenghao-mou requested a review from a team April 22, 2026 07:38

chenghao-mou added 4 commits April 30, 2026 17:31

add token in header instead

999edd5

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

cde90de

wip

3603f04

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

6272402

This comment was marked as resolved.

Sign in to view

chenghao-mou added 2 commits May 14, 2026 19:20

refactor for the cloud model

3bc3ff3

add support for both v1 and v1-mini

a08b624

This comment was marked as resolved.

Sign in to view

chenghao-mou added 3 commits May 15, 2026 11:36

fix example

f435571

address comments

8e75d60

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

cf54cbe

This comment was marked as resolved.

Sign in to view

chenghao-mou added 6 commits May 15, 2026 12:35

address comments

4f10a69

clean up session _on_error annotation

e96f1be

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

97400d2

merge inference and local eot code

b1e9294

update tests

49f0de0

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

7fe2bfb

chenghao-mou changed the title ~~Multimodal EOU~~ Audio EOT May 17, 2026

chenghao-mou added 5 commits May 17, 2026 19:25

clean up

8b150aa

minor refactor and clean up

28af3f5

refactor

75ddae6

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

76cec5d

refactor

2ccf54d

This comment was marked as resolved.

Sign in to view

chenghao-mou added 5 commits May 19, 2026 13:20

clean up

7fbca08

refactor

82c599a

clean up

4b6fdb5

more refactoring

7500160

fix makefile indentation

efe8d5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio EOT#4722

Audio EOT#4722
chenghao-mou wants to merge 62 commits into
mainfrom
feat/AGT-2520-multimodal-EOU

chenghao-mou commented Feb 5, 2026 •

edited

Loading

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenghao-mou commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Structure

Test plan

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghao-mou commented Feb 5, 2026 •

edited

Loading