feat: add TTS voice response messages via Mistral Voxtral API#167
feat: add TTS voice response messages via Mistral Voxtral API#167vkavun wants to merge 29 commits into
Conversation
TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add five new Pydantic Settings fields for text-to-speech voice responses: enable_voice_responses, voice_response_model, voice_response_voice, voice_response_format, and voice_response_max_length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add voice_responses_enabled property to FeatureFlags that gates TTS on both enable_voice_responses setting and mistral_api_key being set. Register it in is_feature_enabled() and get_enabled_features(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds migration 5 to extend the users table with a voice_responses_enabled boolean column, updates UserModel with the new field, and adds get/set repository methods to UserRepository with full test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements TDD-verified method that checks feature flag, user toggle, synthesizes speech via voice_handler, and falls back to text on failure. Short responses get a label; long responses get summarized via Claude + full text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Insert _maybe_send_voice_response() call between image-caption logic and text-sending loop; skip text messages when voice is successfully sent. Initialize response_content=None before try block to prevent UnboundLocalError on error paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for long-response summarization path (Task 8) and update TTS failure test to assert fallback note; send "(Audio unavailable, sent as text)" message in the except block when TTS fails (Task 9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TTS voice response was only wired into agentic_text() but voice messages go through agentic_voice() -> _handle_agentic_media_message() which had its own separate response-sending path without TTS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- synthesize_speech() now uses httpx directly to call /v1/audio/speech - Uses voice_id (UUID) instead of voice name (no preset names exist) - Decodes base64 audio_data from response - Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603) - Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to Claude's system prompt so it knows about TTS capabilities and stops telling users it cannot send voice messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for this comprehensive TTS implementation! A few things needed before we can merge:
The feature design is solid — looking forward to getting this in after the above items are addressed. |
Handles the check_match callback from escalation messages by running claude -p with web search to evaluate current match state, then editing the original message in-place with a verdict (winning/losing/won/lost). Button remains available for repeated checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles the new [🔎 Investigate] button on trade fill notifications. Sends a placeholder reply, runs claude -p with DB queries/log analysis/ web search, then edits the placeholder with structured investigation results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The check_match_button markup only included the Check Match button, so clicking it would replace both buttons with just the one. Now both buttons are preserved after the message edit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of spawning Claude with WebSearch/WebFetch tools to find match scores (slow, expensive, leaks source URLs), now: 1. Look up sofa_id from poly_dashboard DB by player names 2. Fetch live/final score directly from SofaScore API (~1s) 3. Pass structured score data to Claude with no tools for assessment Result: faster response, cleaner 3-line output (status/score/reason), no web search sources in the verdict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis - Parse stdout to extract only STATUS/Score/Reason lines, strip preamble - Add --max-turns 1 to prevent tool loop overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SofaScore API returns 403 from the prod server IP. Instead of calling the API directly (which would need proxy routing), read the latest match snapshot from the poly_dashboard DB — the collector already stores live scores there via proxied SofaScore polling. Also adds current set games to the score summary for Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of showing "unexpected error" when users click buttons from before a bot restart, catch the Telegram "query too old" error and continue processing the action normally. Also prevent these benign errors from being logged as security violations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
👋 Came across this while extracting upstream-candidate features from a downstream fork. Wanted to share an alternative TTS shape in case it's useful as input — happy to defer to your approach (or open a follow-up PR layering provider fallback on top) depending on which direction @RichardAtCT prefers. Different scope: rather than Mistral-only, a provider fallback chain with three providers (OpenAI / ElevenLabs / Piper-via-Wyoming) that retries automatically on per-provider failure with a 5-min cooldown, plus a Code lives at: https://github.com/americodias/claude-code-telegram/blob/upstream-switch/src/bot/features/tts_handler.py (400 LOC, lazy provider client init, sentence-boundary chunking for OpenAI's 4096 / ElevenLabs' 5000 limits, ffmpeg PCM→OGG-Opus conversion for Piper). The two designs aren't strictly incompatible — your No pressure either way — your PR's been quietly waiting for 3+ weeks and I'd rather not stack a competing one without checking. |
Migrated prod from AWS (~/poly_dashboard, /home/ubuntu) to a consolidated Hetzner box (~/tennis, user deploy). Repoint the match-check + investigate handlers' DB path, cwd, and prompt query/log paths accordingly so the Telegram '🔍 Check Match' / '🔎 Investigate' buttons work again.
…tic mode The check-match / investigate-trade escalation buttons (from trade notifications and escalate.sh) were only wired into the classic callback router. In agentic mode (the default) tapping them did nothing. Register handle_callback_query for those two callback_data patterns in agentic mode too.
…g_X.Y_under/over The check-match prompt had no explanation of what market types mean, causing Claude to produce confused reasoning (e.g. counting all-match games instead of only first-set games for fsg_8.5_under, or not clearly resolving fs_p1). Add explicit market type definitions to the prompt: - fs_p1/fs_p2: ONLY set 1 matters — ignore sets 2+ entirely - fsg_X.Y_under: sum home+away games in SET 1 ONLY, win if < X.Y - fsg_X.Y_over: sum home+away games in SET 1 ONLY, win if > X.Y - match_winner/p1_win/p2_win/sets_over/sets_under: full match context Add worked examples (6-2 → 8 games < 8.5 → WON) and a HOW TO READ THE SCORE section so Claude maps home/away to Player 1/Player 2 correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asoning
Two root-cause fixes for corrupted Check Match output:
1. Parser collected EVERY line starting with a status/Score/Reason prefix, so
when the model emitted two contradictory blocks (a WON block then a LOST
block) both were concatenated and shown. Extract a SINGLE block via new
_parse_verdict_block(): take the first status line + its Score/Reason and
stop at the next status line. Shared by check_match and investigate.
2. Prompt forced a final WON/LOST even for in-progress matches and had no
total-games examples. Rewrite with a status-first decision procedure:
- fs_/fsg_ settle once SET 1 is finished, else UNDETERMINED.
- full-match tg_/sets_/match_winner settle only when finished OR the result
is mathematically locked (min_final = current_total + games the unfinished
set still owes); else UNDETERMINED.
Adds the ⏳ UNDETERMINED status and worked examples incl. the real failure
case (tg_23.5_over, 5-7|6-1|0-4 -> min_final >=25 -> WON, locked).
Regression test covers the exact double-verdict corruption.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
/voice on|offtoggle persisted in SQLite, gated behind admin-levelENABLE_VOICE_RESPONSESenv varChanges
ENABLE_VOICE_RESPONSES,VOICE_RESPONSE_MODEL,VOICE_RESPONSE_VOICE,VOICE_RESPONSE_FORMAT,VOICE_RESPONSE_MAX_LENGTH)voice_responses_enabledin FeatureFlagsvoice_responses_enabledcolumn to users table + repository get/set methodssynthesize_speech()method callingclient.audio.speech.complete_async()/voicecommand handler +_maybe_send_voice_response()wired intoagentic_text()flowTest plan
ENABLE_VOICE_RESPONSES=truein production env/voice onpersists preference and/voice offclears it🤖 Generated with Claude Code