This repository now includes a standalone LAN voice server for ESP32 clients:
- Entry point:
ws_server.py - Transport: WebSocket
- Input audio: PCM16, 16kHz, mono, chunked binary frames
- Output audio: PCM16, mono (sample rate returned in
donemessage) - Scope: single active device (stability-first)
Install dependency if needed:
pip install websockets
pip install aiohttpStart the server:
python ws_server.pyServer prints a LAN URL such as ws://192.168.x.x:8765 for ESP32 configuration.
It also prints an admin dashboard URL such as http://192.168.x.x:8766/admin.
Run local protocol tester in another terminal:
python tests/test_ws_loopback.py --url ws://127.0.0.1:8765Client to server:
- Binary frame: append PCM16 audio chunk to current utterance buffer.
- JSON text frame:
{"type":"start_utterance"}{"type":"end_utterance"}{"type":"ping"}{"type":"reset"}
Server to client:
- JSON text frames:
hello,state,text,done,error,ignored,pong,ok - Binary frame: synthesized reply audio bytes (PCM16)
WebSocket options are in config.py:
WS_HOST,WS_PORTWS_MAX_MESSAGE_BYTES,WS_IDLE_TIMEOUT_SECWS_MAX_UTTERANCE_SECWS_INPUT_SAMPLE_RATE,WS_INPUT_CHANNELS,WS_INPUT_SAMPLE_WIDTH_BYTESWS_LOG_VERBOSE
Admin dashboard options in config.py:
ADMIN_HTTP_HOST,ADMIN_HTTP_PORTADMIN_POLL_INTERVAL_MSADMIN_ENABLE_VERBOSE_EVENTSMODULE_ASR_ENABLED,MODULE_LLM_ENABLED,MODULE_TTS_ENABLED
Admin endpoints:
GET /api/admin/statusGET /api/admin/eventsGET /api/admin/connectionsPOST /api/admin/modulesPOST /api/admin/send-text
Dashboard behavior (simplified default view):
- Shows key live summary fields only (active device, duration, latency, traffic, last error).
- Detailed raw JSON is available in a collapsible "Detailed JSON" section.
- Event panel defaults to
error + warningand supports severity filtering.
Module switch behavior:
- ASR disabled: server returns
ASR_DISABLEDand skips utterance processing. - LLM disabled: server returns a controlled maintenance text response.
- TTS disabled: server returns text response without audio bytes.
Admin text fallback behavior:
POST /api/admin/send-textsynthesizes provided text and pushes audio to the active ESP32 client.- The pushed text is recorded into active session history as an admin broadcast.
- Common error codes:
NO_ACTIVE_CLIENT,EMPTY_TEXT,TTS_DISABLED,CLIENT_BUSY.
Common integration issues:
ADMIN_TTS_ERROR: Model is multi-speaker but no speaker is provided: The server now uses configuredTTS_SPEAKERwhen synthesizing admin fallback text.UTTERANCE_TOO_LONG: This means buffered input audio exceededWS_MAX_UTTERANCE_SEClimit. The error message includes current/max bytes to help MCU-side tuning.
Manual fallback test script:
python tests/test_admin_fallback.py --ws-url ws://127.0.0.1:8765 --admin-base http://127.0.0.1:8766This project supports optional Web Search with automatic routing.
- Set API key (PowerShell):
$env:SERPER_API_KEY="your_serper_api_key"- Enable in
config.py:
ENABLE_WEB_SEARCH = TrueWEB_SEARCH_PROVIDER = "serper"
- Optional tuning in
config.py:
WEB_SEARCH_MAX_RESULTSWEB_SEARCH_TIMEOUT_SECWEB_SEARCH_MIN_CONFIDENCEWEB_SEARCH_SOURCES_MAX
Behavior:
- Campus questions still prefer local KB context.
- Deterministic questions (for example, current date/time) are answered locally first.
- Time-sensitive / low-KB-coverage questions can auto-trigger Web search.
- Social sources are down-ranked; higher-quality sources are preferred.
- Replies remain natural, with a short sources list when web evidence passes quality gates.