v4.1.0 — audio quality + read cache by MKS-01 · Pull Request #20 · MKS-01/readback

MKS-01 · 2026-06-23T19:07:07Z

Summary

v4.1.0 — audio quality, synthesis performance, and developer experience.

Audio quality

Degenerate-chunk guard — all-silence chunks retry synthesis once before being dropped
Crossfade joins — 100 ms linear fade-out at chunk tails smooths the voiced→silence transition
Peak normalization already shipped; these complete the audio post-processing chain

Performance

Read cache — skip the entire pipeline (fetch → summarize → synthesize) on re-reads. Cache key = (url, mode, voice, llm_model) with composite index; only hits when the WAV still exists on disk
Faster synthesis — default precision fp32 → bf16 (~6% faster), chunk cap 280 → 400 chars (~30% fewer CSM prefills), sampler cached per (temperature, top_k). Speed/quality preset guide in config.yaml
New llm_model column on the reads table (auto-migrated for existing DBs)

CLI

Generation timer — player metadata line shows "Xs to generate" for live reads
Library UI revamp — every row shows mode · duration · words · date inline (accent blue on active); space previews audio inline without leaving the library; enter opens the full player
Venv auto-detect — server spawn uses .venv/bin/python3 -m readback directly (no activation needed); stderr captured so startup crashes show the actual error
Fixed library migration ordering (ALTER TABLE before index creation)

Tests & CI

Trimmed test suite: 59 → 38 (removed redundant/defensive tests)
New docs/TESTS.md — full catalogue of every test case by module
CI publishes JUnit test summary on PRs; Python matrix trimmed to 3.10 + 3.12

Docs

All doc surfaces synced (CLAUDE.md, README, ARCHITECTURE, ROADMAP, PLAN, SETUP, TESTS)
README updated: MLX in-process references throughout, bf16 default
JOURNEY.md fully rewritten — complete prose devlog, no scaffold prompts
Finetune README rewritten with clone-vs-LoRA table and quick-start
Version bumped across all 4 anchors

Before merging

Update screenshots — CLI player (generation timer), library view (inline metadata + preview indicator) have changed. Recapture docs/media/cli-player.png and docs/media/cli-home.png, then run the refresh-screenshots skill.

Test plan

38 pytest pass
Smoke test: read an article, quit, re-read — second read instant (cache hit)
/model switch → re-read should NOT cache-hit
Listen for smoother chunk joins (crossfade)
readback-cli from cold — server spawns via venv auto-detect
Generation timer visible on player screen
/lib → space previews inline, enter opens full player

🤖 Generated with Claude Code

Three audio-quality and performance improvements: - Degenerate-chunk guard: when CSM produces all-silence audio, retry synthesis once before dropping the chunk - Light crossfade: 100ms linear fade-out at chunk tails smooths the transition into inter-chunk silence gaps - Read cache: skip the entire pipeline (fetch/summarize/synthesize) when re-reading the same (url, mode, voice, llm_model) — the library lookup returns the existing WAV instantly. Adds llm_model column to the reads table (auto-migrated for existing DBs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers find_cached(source_url, mode, voice, llm_model) so the cache check is an index seek instead of a table scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update project tree, server pipeline, chunk+synth, and library sections in CLAUDE.md to document the new features. Add PLAN.md entry for the 2026-06-24 optimisation work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Audio quality + performance release: read cache, degenerate-chunk guard, crossfade joins. Version anchors and docs updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove redundant/defensive tests (59 → 38). Add docs/TESTS.md cataloguing every test case by module. CI now publishes a JUnit test summary on PRs and runs only Python 3.10 + 3.12. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update Python version refs from 3.10–3.12 → 3.10 + 3.12 in CLAUDE.md, ROADMAP.md, SETUP.md. Add TESTS.md to project tree. Update test count and CI description (JUnit summary). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Synthesis speed tuning (3 knobs, documented in config.yaml): - Default precision fp32 → bf16 (~6% faster, no audible quality loss) - Chunk cap 280 → 400 chars (fewer CSM prefills, ~30% fewer chunks) - Sampler cached per (temperature, top_k) — avoids recreation per chunk CLI: player metadata line now shows "Xs to generate" for live reads (timings field added to DoneMsg; hidden for library replays). Server spawn: readbackBin() now prefers `.venv/bin/python3 -m readback` (works without venv activation); stderr captured so startup crashes show the actual error. Fixed library migration ordering — ALTER TABLE for llm_model now runs before the cache index that references it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update CLAUDE.md (chunk cap 400, bf16 default, sampler cache, venv spawn, generation timer in PlayerView), README.md (bf16 default in stack table + config table), ARCHITECTURE.md (400-char chunks, fade-out, retry), ROADMAP.md (faster synthesis checked off), PLAN.md (new entry). Fix all stale fp32/280 references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Clarify that all three models (summary LLM, vision OCR, TTS) run in-process on Metal via mlx-lm/mlx-vlm/csm-mlx — no external daemons, no API keys. Added cache-hit note to the pipeline steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add v3.2.0 (Pi), v4.0.0 (MLX in-process), v4.1.0 (cache + speed) to the timeline. Fix stack snapshot: Ollama → mlx-lm (in-process), fp32 → bf16, add vision OCR and read cache. Update afplay/uvicorn gotchas to match current behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the TODO-scaffold devlog with complete prose. Timeline through v4.1.0, narrative arc with the two pivots explained, honest workflow section, six technical decisions with reasoning, clean gotcha list, and a closing takeaway. Stack snapshot tagged v4.1.0 with Pi layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrote src/finetune/README.md with a why-LoRA-vs-clone table, quick-start section, detailed steps, tuning knobs, and revert instructions. Added data/.gitkeep so the training data directory is tracked (clips are gitignored). Pipeline is end-to-end ready — just needs audio clips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Library view now shows mode · duration · words · date on every row (active row in accent blue). Space toggles inline audio preview (plays without leaving the library; ♫ + elapsed shown on the row). Enter still opens the full player. Back/esc stops any preview. Descoped paste-raw-text and local-docs to Later in the roadmap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MKS-01 and others added 13 commits June 24, 2026 00:26

perf: add composite index for cache lookups

0bd2031

Covers find_cached(source_url, mode, voice, llm_model) so the cache check is an index seek instead of a table scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: bump version to v4.1.0

ffa0a36

Audio quality + performance release: read cache, degenerate-chunk guard, crossfade joins. Version anchors and docs updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.1.0 — audio quality + read cache#20

v4.1.0 — audio quality + read cache#20
MKS-01 wants to merge 13 commits into
mainfrom
optimisation

MKS-01 commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MKS-01 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Audio quality

Performance

CLI

Tests & CI

Docs

Before merging

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MKS-01 commented Jun 23, 2026 •

edited

Loading