feat: add TTS voice response messages via Mistral Voxtral API by vkavun · Pull Request #167 · RichardAtCT/claude-code-telegram

vkavun · 2026-03-28T12:16:08Z

Summary

Add text-to-speech capability so the bot can send Claude's responses as Telegram voice messages using Mistral's Voxtral TTS API
Per-user /voice on|off toggle persisted in SQLite, gated behind admin-level ENABLE_VOICE_RESPONSES env var
Short responses: sent as voice message + brief label; long responses (>threshold): Claude summarizes for spoken delivery, audio of summary + full text sent
Graceful fallback to text with "(Audio unavailable, sent as text)" note on TTS failure

Changes

Config: 5 new env vars (ENABLE_VOICE_RESPONSES, VOICE_RESPONSE_MODEL, VOICE_RESPONSE_VOICE, VOICE_RESPONSE_FORMAT, VOICE_RESPONSE_MAX_LENGTH)
Feature flag: voice_responses_enabled in FeatureFlags
Storage: Migration 5 adds voice_responses_enabled column to users table + repository get/set methods
VoiceHandler: New synthesize_speech() method calling client.audio.speech.complete_async()
Orchestrator: /voice command handler + _maybe_send_voice_response() wired into agentic_text() flow
CLAUDE.md: Updated with new command and settings docs

Test plan

533 tests pass, 0 failures
Enable ENABLE_VOICE_RESPONSES=true in production env
Verify /voice on persists preference and /voice off clears it
Send a short message and confirm voice message is received
Send a long message (>2000 chars) and confirm summary audio + full text
Verify TTS failure gracefully falls back to text with note

🤖 Generated with Claude Code

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add five new Pydantic Settings fields for text-to-speech voice responses: enable_voice_responses, voice_response_model, voice_response_voice, voice_response_format, and voice_response_max_length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add voice_responses_enabled property to FeatureFlags that gates TTS on both enable_voice_responses setting and mistral_api_key being set. Register it in is_feature_enabled() and get_enabled_features(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds migration 5 to extend the users table with a voice_responses_enabled boolean column, updates UserModel with the new field, and adds get/set repository methods to UserRepository with full test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements TDD-verified method that checks feature flag, user toggle, synthesizes speech via voice_handler, and falls back to text on failure. Short responses get a label; long responses get summarized via Claude + full text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Insert _maybe_send_voice_response() call between image-caption logic and text-sending loop; skip text messages when voice is successfully sent. Initialize response_content=None before try block to prevent UnboundLocalError on error paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add test for long-response summarization path (Task 8) and update TTS failure test to assert fallback note; send "(Audio unavailable, sent as text)" message in the except block when TTS fails (Task 9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TTS voice response was only wired into agentic_text() but voice messages go through agentic_voice() -> _handle_agentic_media_message() which had its own separate response-sending path without TTS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- synthesize_speech() now uses httpx directly to call /v1/audio/speech - Uses voice_id (UUID) instead of voice name (no preset names exist) - Decodes base64 audio_data from response - Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603) - Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to Claude's system prompt so it knows about TTS capabilities and stops telling users it cannot send voice messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RichardAtCT · 2026-03-30T06:58:34Z

Thanks for this comprehensive TTS implementation! A few things needed before we can merge:

Size concern: At 2K+ lines, this is a large PR. Consider whether any parts can be split out (e.g., the database migration as a separate PR).
Test coverage: Please add automated tests for the core TTS logic (Mistral API client, summarization for long responses, fallback behavior).
Coordination with Add make run-watch for auto-restart during development #158: PR Add make run-watch for auto-restart during development #158 (which we're merging) also modifies voice-related code (whisper.cpp support). You'll likely need to rebase after Add make run-watch for auto-restart during development #158 lands.
orchestrator.py conflicts: Several other PRs touching orchestrator.py are being merged — please rebase once the current batch completes.

The feature design is solid — looking forward to getting this in after the above items are addressed.

Handles the check_match callback from escalation messages by running claude -p with web search to evaluate current match state, then editing the original message in-place with a verdict (winning/losing/won/lost). Button remains available for repeated checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Handles the new [🔎 Investigate] button on trade fill notifications. Sends a placeholder reply, runs claude -p with DB queries/log analysis/ web search, then edits the placeholder with structured investigation results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The check_match_button markup only included the Check Match button, so clicking it would replace both buttons with just the one. Now both buttons are preserved after the message edit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of spawning Claude with WebSearch/WebFetch tools to find match scores (slow, expensive, leaks source URLs), now: 1. Look up sofa_id from poly_dashboard DB by player names 2. Fetch live/final score directly from SofaScore API (~1s) 3. Pass structured score data to Claude with no tools for assessment Result: faster response, cleaner 3-line output (status/score/reason), no web search sources in the verdict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis - Parse stdout to extract only STATUS/Score/Reason lines, strip preamble - Add --max-turns 1 to prevent tool loop overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SofaScore API returns 403 from the prod server IP. Instead of calling the API directly (which would need proxy routing), read the latest match snapshot from the poly_dashboard DB — the collector already stores live scores there via proxied SofaScore polling. Also adds current set games to the score summary for Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of showing "unexpected error" when users click buttons from before a bot restart, catch the Telegram "query too old" error and continue processing the action normally. Also prevent these benign errors from being logged as security violations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

americodias · 2026-04-27T14:29:59Z

👋 Came across this while extracting upstream-candidate features from a downstream fork. Wanted to share an alternative TTS shape in case it's useful as input — happy to defer to your approach (or open a follow-up PR layering provider fallback on top) depending on which direction @RichardAtCT prefers.

Different scope: rather than Mistral-only, a provider fallback chain with three providers (OpenAI / ElevenLabs / Piper-via-Wyoming) that retries automatically on per-provider failure with a 5-min cooldown, plus a /tts command for runtime status + provider switching. Designed for "self-hosted Piper on the homelab as primary, OpenAI as fallback when Piper is down" rather than choosing one vendor at deploy time.

Code lives at: https://github.com/americodias/claude-code-telegram/blob/upstream-switch/src/bot/features/tts_handler.py (400 LOC, lazy provider client init, sentence-boundary chunking for OpenAI's 4096 / ElevenLabs' 5000 limits, ffmpeg PCM→OGG-Opus conversion for Piper).

The two designs aren't strictly incompatible — your voice_responses_enabled per-user toggle + summarize-on-long are orthogonal to the provider-selection layer. If you'd prefer to land #167 first then layer fallback on top, I'm happy to rebase mine onto whatever lands here.

No pressure either way — your PR's been quietly waiting for 3+ weeks and I'd rather not stack a competing one without checking.

Migrated prod from AWS (~/poly_dashboard, /home/ubuntu) to a consolidated Hetzner box (~/tennis, user deploy). Repoint the match-check + investigate handlers' DB path, cwd, and prompt query/log paths accordingly so the Telegram '🔍 Check Match' / '🔎 Investigate' buttons work again.

…tic mode The check-match / investigate-trade escalation buttons (from trade notifications and escalate.sh) were only wired into the classic callback router. In agentic mode (the default) tapping them did nothing. Register handle_callback_query for those two callback_data patterns in agentic mode too.

…g_X.Y_under/over The check-match prompt had no explanation of what market types mean, causing Claude to produce confused reasoning (e.g. counting all-match games instead of only first-set games for fsg_8.5_under, or not clearly resolving fs_p1). Add explicit market type definitions to the prompt: - fs_p1/fs_p2: ONLY set 1 matters — ignore sets 2+ entirely - fsg_X.Y_under: sum home+away games in SET 1 ONLY, win if < X.Y - fsg_X.Y_over: sum home+away games in SET 1 ONLY, win if > X.Y - match_winner/p1_win/p2_win/sets_over/sets_under: full match context Add worked examples (6-2 → 8 games < 8.5 → WON) and a HOW TO READ THE SCORE section so Claude maps home/away to Player 1/Player 2 correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…asoning Two root-cause fixes for corrupted Check Match output: 1. Parser collected EVERY line starting with a status/Score/Reason prefix, so when the model emitted two contradictory blocks (a WON block then a LOST block) both were concatenated and shown. Extract a SINGLE block via new _parse_verdict_block(): take the first status line + its Score/Reason and stop at the next status line. Shared by check_match and investigate. 2. Prompt forced a final WON/LOST even for in-progress matches and had no total-games examples. Rewrite with a status-first decision procedure: - fs_/fsg_ settle once SET 1 is finished, else UNDETERMINED. - full-match tg_/sets_/match_winner settle only when finished OR the result is mathematically locked (min_final = current_total + games the unfinished set still owes); else UNDETERMINED. Adds the ⏳ UNDETERMINED status and worked examples incl. the real failure case (tg_23.5_over, 5-7|6-1|0-4 -> min_final >=25 -> WON, locked). Regression test covers the exact double-verdict corruption. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vkavun and others added 16 commits March 28, 2026 13:33

docs: add design spec for audio response messages feature

54522d7

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add implementation plan for audio response messages

38679a5

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add synthesize_speech() TTS method to VoiceHandler

24f3a33

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add /voice on|off toggle command

d5cec76

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: update orchestrator tests for /voice command (7 commands)

f31da0d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update CLAUDE.md with /voice command and TTS settings

c27851e

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove redundant 'Voice response' text label for short responses

cd91f62

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ubuntu and others added 8 commits April 4, 2026 12:32

fix: player name regex matching across newlines

2e1cbbd

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vkavun and others added 4 commits June 17, 2026 14:02

deps: add requests (callback.py needs it; now imported in agentic mode)

3ff82f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TTS voice response messages via Mistral Voxtral API#167

feat: add TTS voice response messages via Mistral Voxtral API#167
vkavun wants to merge 29 commits into
RichardAtCT:mainfrom
vkavun:feature/audio-response-messages

vkavun commented Mar 28, 2026

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

americodias commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vkavun commented Mar 28, 2026

Summary

Changes

Test plan

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

americodias commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants