feat(vad): add configurable VAD policy by JMLX42 · Pull Request #111 · itsmontoya/scribble

JMLX42 · 2026-01-20T11:22:53Z

Summary

Implements Default for VadPolicy
Exports VadPolicy from the public API
Adds vad_policy field to Opts for custom VAD configuration
Adds VadProcessor::with_policy constructor
Uses vad_policy from Opts when creating VAD processor
Reduces VAD window size from 2s to 500ms for more responsive speech detection

Dependencies

Depends on feat(vad): expose last_vad_speech_instant on BackendStream trait #110 (feat/expose-vad-speech-instant)

Context

This feature is used by the Friday project.

This PR was created with the assistance of an AI assistant (Claude).

Add pass-through features for GPU backends: - cuda: NVIDIA CUDA - metal: Apple Metal - hipblas: AMD ROCm - vulkan: Cross-platform Vulkan - coreml: Apple CoreML This allows consumers to enable GPU acceleration by adding the appropriate feature to their Cargo.toml, e.g.: scribble = { version = "0.5", features = ["cuda"] }

…cases By default, the incremental transcriber waits for 2+ segments before emitting, treating the last segment as potentially incomplete. This adds latency for short utterances like voice assistant commands. The new `emit_single_segments` option (default: false) allows emitting single segments immediately when detected. This is useful for: - Voice assistants - Real-time transcription - Any application where low latency is more important than waiting for natural sentence boundaries When enabled, single segments are emitted as soon as Whisper produces them, rather than waiting for a second segment or the 30-second force-flush timeout.

When VAD detects no speech in an audio window, skip forwarding it to Whisper entirely. This prevents hallucinations like "Merci" or "Thank you for watching" that Whisper produces from silence with high confidence. Changes: - process_ready_windows(): skip windows where VAD returns false - flush(): only forward final buffer if VAD detects speech Also fixes pre-existing test compilation (missing emit_single_segments field) and formatting issues.

Move VAD filtering from the high-level Scribble API into the backend stream. This ensures VAD works regardless of which API consumers use (direct backend access or high-level Scribble::transcribe). Changes: - WhisperStream now optionally wraps audio with VadStream when enable_voice_activity_detection is true - Remove VAD wrapping from Scribble::transcribe_with_encoder() to avoid double-filtering - Export VadProcessor, VadStream, VadStreamReceiver publicly - Make VadStream methods public for use in backend This fixes the issue where friday-daemon's direct backend usage bypassed VAD filtering entirely.

…etection VAD windows at 2 seconds meant last_speech_instant() only updated every 2 seconds during continuous speech. With typical silence thresholds of 1 second, this caused premature end-of-utterance detection. Reducing to 500ms (8000 samples at 16kHz) means speech instant updates 4x more frequently, enabling accurate silence gap measurement. Silero VAD works reliably with windows down to ~250ms.

JMLX42 added 14 commits January 12, 2026 21:16

Merge branch 'fix/emit-single-segments' into friday

afbb572

feat(vad): track last_speech_instant in VadStream

057bad3

feat(backend): expose last_vad_speech_instant on BackendStream trait

2993f1c

feat(vad): implement Default for VadPolicy

713438b

feat(vad): export VadPolicy from vad module

a4d9800

feat: export VadPolicy from public API

421e28e

feat(opts): add vad_policy field for custom VAD configuration

2fc004b

feat(vad): add VadProcessor::with_policy constructor

59f492f

feat(whisper): use vad_policy from Opts when creating VAD processor

02f1ecb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vad): add configurable VAD policy#111

feat(vad): add configurable VAD policy#111
JMLX42 wants to merge 14 commits into
itsmontoya:mainfrom
lx-industries:feat/configurable-vad-policy

JMLX42 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JMLX42 commented Jan 20, 2026

Summary

Dependencies

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant