v4.1.0 — audio quality + read cache#20
Draft
MKS-01 wants to merge 13 commits into
Draft
Conversation
Three audio-quality and performance improvements: - Degenerate-chunk guard: when CSM produces all-silence audio, retry synthesis once before dropping the chunk - Light crossfade: 100ms linear fade-out at chunk tails smooths the transition into inter-chunk silence gaps - Read cache: skip the entire pipeline (fetch/summarize/synthesize) when re-reading the same (url, mode, voice, llm_model) — the library lookup returns the existing WAV instantly. Adds llm_model column to the reads table (auto-migrated for existing DBs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers find_cached(source_url, mode, voice, llm_model) so the cache check is an index seek instead of a table scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update project tree, server pipeline, chunk+synth, and library sections in CLAUDE.md to document the new features. Add PLAN.md entry for the 2026-06-24 optimisation work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio quality + performance release: read cache, degenerate-chunk guard, crossfade joins. Version anchors and docs updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove redundant/defensive tests (59 → 38). Add docs/TESTS.md cataloguing every test case by module. CI now publishes a JUnit test summary on PRs and runs only Python 3.10 + 3.12. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update Python version refs from 3.10–3.12 → 3.10 + 3.12 in CLAUDE.md, ROADMAP.md, SETUP.md. Add TESTS.md to project tree. Update test count and CI description (JUnit summary). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Synthesis speed tuning (3 knobs, documented in config.yaml): - Default precision fp32 → bf16 (~6% faster, no audible quality loss) - Chunk cap 280 → 400 chars (fewer CSM prefills, ~30% fewer chunks) - Sampler cached per (temperature, top_k) — avoids recreation per chunk CLI: player metadata line now shows "Xs to generate" for live reads (timings field added to DoneMsg; hidden for library replays). Server spawn: readbackBin() now prefers `.venv/bin/python3 -m readback` (works without venv activation); stderr captured so startup crashes show the actual error. Fixed library migration ordering — ALTER TABLE for llm_model now runs before the cache index that references it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update CLAUDE.md (chunk cap 400, bf16 default, sampler cache, venv spawn, generation timer in PlayerView), README.md (bf16 default in stack table + config table), ARCHITECTURE.md (400-char chunks, fade-out, retry), ROADMAP.md (faster synthesis checked off), PLAN.md (new entry). Fix all stale fp32/280 references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clarify that all three models (summary LLM, vision OCR, TTS) run in-process on Metal via mlx-lm/mlx-vlm/csm-mlx — no external daemons, no API keys. Added cache-hit note to the pipeline steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add v3.2.0 (Pi), v4.0.0 (MLX in-process), v4.1.0 (cache + speed) to the timeline. Fix stack snapshot: Ollama → mlx-lm (in-process), fp32 → bf16, add vision OCR and read cache. Update afplay/uvicorn gotchas to match current behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the TODO-scaffold devlog with complete prose. Timeline through v4.1.0, narrative arc with the two pivots explained, honest workflow section, six technical decisions with reasoning, clean gotcha list, and a closing takeaway. Stack snapshot tagged v4.1.0 with Pi layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrote src/finetune/README.md with a why-LoRA-vs-clone table, quick-start section, detailed steps, tuning knobs, and revert instructions. Added data/.gitkeep so the training data directory is tracked (clips are gitignored). Pipeline is end-to-end ready — just needs audio clips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Library view now shows mode · duration · words · date on every row (active row in accent blue). Space toggles inline audio preview (plays without leaving the library; ♫ + elapsed shown on the row). Enter still opens the full player. Back/esc stops any preview. Descoped paste-raw-text and local-docs to Later in the roadmap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v4.1.0 — audio quality, synthesis performance, and developer experience.
Audio quality
Performance
(url, mode, voice, llm_model)with composite index; only hits when the WAV still exists on diskconfig.yamlllm_modelcolumn on thereadstable (auto-migrated for existing DBs)CLI
mode · duration · words · dateinline (accent blue on active); space previews audio inline without leaving the library; enter opens the full player.venv/bin/python3 -m readbackdirectly (no activation needed); stderr captured so startup crashes show the actual errorTests & CI
docs/TESTS.md— full catalogue of every test case by moduleDocs
Before merging
docs/media/cli-player.pnganddocs/media/cli-home.png, then run therefresh-screenshotsskill.Test plan
/modelswitch → re-read should NOT cache-hitreadback-clifrom cold — server spawns via venv auto-detect/lib→ space previews inline, enter opens full player🤖 Generated with Claude Code