Skip to content

Alice voice: wire Kokoro TTS + voice identity field (Phase 1) #80

@koad

Description

@koad

Assignment: Vulcan

Muse, Iris, Cacula, and Veritas converged on Alice's voice design (juno#80 closed today). Your implementation scope is Phase 1 — ship a working Alice voice in the browser, sovereign stack, minimal identity-schema extension.

Reference material

  • Muse design v1: /home/koad/.muse/designs/alice-voice-design-v1.md (koad/muse@fca7e7e)
  • Muse parameter deltas: /home/koad/.muse/designs/alice-voice-parameter-deltas-v1.md (koad/muse@1168ea5)
  • Iris brand review (APPROVED-with-notes): /home/koad/.iris/reviews/2026-04-15-alice-voice-brand-review.md
  • Cacula mechanic: /home/koad/.cacula/designs/2026-04-15-alice-voice-mechanics.md
  • Veritas Piper ruling: /home/koad/.veritas/rulings/2026-04-15-piper-tts-gpl3.md

Phase 1 scope

  1. Install Kokoro TTS on thinker (or appropriate serving host). Apache 2.0; confirmed sovereign-safe per Veritas.
  2. voice identity field in passenger.json schema — adjacent to outfit. Fields per Muse's integration sketch: provider, voice_id, speed, custom_model (nullable). Coordinate with Vesta if protocol-level schema change (likely yes).
  3. Server-side TTS endpoint — accepts text + mode label, returns streaming audio. Mode label maps to Muse's parameter delta table (speed + SSML <break> durations; Iris's Sample-C flat-delivery constraint encoded for error:streak:3 mode).
  4. Browser playback — WebSocket to Web Audio API on the Alice harness. Phase 1 target: phone browser (mobile-first per Muse's brief).
  5. Event wiring — accept Cacula's five event labels (level:complete, belt:milestone, course:complete, error:streak:3, session:start:new_day) and route to the correct parameter set.

Phase 1 gates

  • Functional first: Alice speaks in browser before distinctive voice training. XTTS-v2 is Phase 2, not scope here.
  • Do not merge the voice identity-schema change without koad signoff. Draft the schema, file an assessment if protocol-level, and flag back. Juno will escalate.
  • Iris gate before any voice actor brief is commissioned for Phase 2 (not your ticket yet).

Acceptance

  • Kokoro serving on thinker with benchmarked latency
  • voice field schema drafted (protocol assessment if applicable)
  • TTS endpoint live + mode labels → parameter deltas wired
  • Browser plays Alice at four tonal modes against test fixtures
  • STT path (Web Speech API) confirmed working on Safari/iOS for learner input (Muse's open Phase 1 item)

Flags

  • juno#80 closed — all design artifacts listed there
  • Kokoro TTS ARM/RPi benchmark still unverified per Sibyl's brief — if thinker resources need confirmation, file an assessment before committing architecture

Metadata

Metadata

Assignees

No one assigned

    Labels

    infrastructureMachines, keys, credentials, SSHnext-actionReady to execute now

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions