perf(pz2d): cap frozen-dict chain walks at 16 links — encode −44% for +0.06pp by ChrisLundquist · Pull Request #152 · ChrisLundquist/libpz

ChrisLundquist · 2026-06-10T19:39:19Z

What

Task #12 part 2 (encode cache tuning). The pz2d encode inflation (§11b: 2.9× concurrent CPU blowup) is the frozen-dict walk itself — every chain link is a random read into a cold multi-MiB prev array, multiplied across 7 concurrent segments. The weak-local-match gate was already measured dead (−8% for +0.018pp), so the remaining §11b lever was a dict-specific budget: frozen walks in find_best now cap at lz77::DICT_CHAIN_CAP = 16 links, on top of (never exceeding) the live walk's leftover max_chain.

Measured (202 MB Silesia blob, e2e CLI encode)

cap	enc wall	user CPU	ratio
64 (old shared budget)	25.0 s	115 s	30.475%
32	18.6 s	91 s	30.500%
24	16.4 s	82 s	30.515%
16 (shipped)	14.1 s	69 s	30.535%
8	12.65 s	61 s	30.570%

No sharp knee; 16 is the judgment call — −44% wall and −40% CPU for +0.06pp, keeping pz2d 0.87pp under pzstd-3. Below 16 the returns invert. Decode is neutral-to-better on the capped wire (39.0 vs 40.5 ms, same-session hyperfine — fewer far-dict matches = fewer random dict reads).

Also re-anchored §11c's decode headline in one hyperfine invocation: master v1 44.9 ms vs arena 39.1 ms (1.15×) — PR #150's −13% holds; absolute walls drift ±10% with machine state, which is now noted in the ledger (compare binaries within a single hyperfine run only).

Validation

Round-trip verified on the blob at cap 16. Suite: 745 + 598 unit tests green, doctests pass, fmt/clippy clean. Only pz2d's encode path consults the frozen dict, so no other pipeline is affected; decode is untouched. Ledger: clean-slate-codec.md §11d.

🤖 Generated with Claude Code

…14.1 s (-44%) for +0.06pp The pz2d encode inflation (§11b: 2.9x concurrent CPU blowup) is the frozen-dict walk itself — every link is a random read into a cold multi-MiB prev array, multiplied across 7 concurrent segments. Frozen walks in find_best now get their own budget (lz77::DICT_CHAIN_CAP), applied on top of the live walk's leftover max_chain, so walking less attacks the inflation at its source. Blob e2e sweep: 64 -> 25.0 s/30.475%, 32 -> 18.6/30.500, 24 -> 16.4/30.515, 16 -> 14.1/30.535 (shipped), 8 -> 12.65/30.570. No sharp knee; 16 trades +0.06pp (still 0.87pp under pzstd-3) for -44% encode wall and -40% CPU; below 16 the returns invert. Decode is neutral-to-better on the capped wire (39.0 vs 40.5 ms same-session — fewer far-dict matches, fewer random dict reads). Also re-anchored §11c's decode headline same-session: master v1 44.9 ms vs arena 39.1 ms (1.15x) — the -13% holds; absolute walls drift ±10% with machine state, so compare binaries within one hyperfine invocation only. Ledger: clean-slate-codec.md §11d. Suite: 745 + 598 green, fmt/clippy clean, blob round-trip verified. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ChrisLundquist merged commit a7d5649 into master Jun 10, 2026

ChrisLundquist deleted the claude/pz2d-dict-chain-cap branch June 10, 2026 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pz2d): cap frozen-dict chain walks at 16 links — encode −44% for +0.06pp#152

perf(pz2d): cap frozen-dict chain walks at 16 links — encode −44% for +0.06pp#152
ChrisLundquist merged 1 commit into
masterfrom
claude/pz2d-dict-chain-cap

ChrisLundquist commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Jun 10, 2026

What

Measured (202 MB Silesia blob, e2e CLI encode)

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant