Skip to content

feat(pz2): CODES_FSE — per-block tANS seq-code lanes, pz2 30.86% / pz2d 30.34% at decode parity#155

Merged
ChrisLundquist merged 2 commits into
masterfrom
claude/pz2-fse-seq-lanes
Jun 10, 2026
Merged

feat(pz2): CODES_FSE — per-block tANS seq-code lanes, pz2 30.86% / pz2d 30.34% at decode parity#155
ChrisLundquist merged 2 commits into
masterfrom
claude/pz2-fse-seq-lanes

Conversation

@ChrisLundquist

Copy link
Copy Markdown
Owner

Summary

Ships the surviving candidate from the #153 entropy probe: the three small-alphabet sequence-code lanes (lit-run/offset/match-len) gain a CODES_FSE wire mode — per-block tANS, 32-symbol alphabet, FSE_LOG=10 (4 KB/lane L1-resident tables). Literals stay 8-lane Huffman (probe measured only −0.06pp headroom there).

The encoder builds CONST/Huffman/FSE candidates per stream and ships the byte-smallest, so the wire is bit-exact never-worse and old streams decode unchanged.

Two tricks bought decode parity:

  • Forward-readable tANS payload: encode walks symbols in reverse (tANS requirement) but re-reverses the emitted bit groups, so decode reads forward with the same LSB whole-byte-refill as the Huffman lanes — no backward bitstream.
  • Entry prefetch: the lane caches table[state]; each next() issues the following symbol's table load at its end, overlapping the latency with the splice copies. Without it: +2.4% ST. With it: neutral-to-faster.

Measurements (same-session hyperfine pairs, M5 Max, silesia blob)

metric master this PR Δ
pz2 ratio 31.044% 30.863% −0.181pp
pz2d ratio 30.535% 30.341% −0.194pp
pz2 decode ST 137.9 ms 137.0 ms neutral
pz2 decode MT 17.1–17.4 ms 17.5–17.7 ms +2–3% (E-core side, ≪ kill line)
pz2d decode MT 33.1 ms 33.6 ms neutral (within σ)
pz2 encode 3.08 s 3.14 s +1.9%

pz2 is now 30.86% vs pzstd-3's 31.4% at ~1.3× its parallel decode; pz2d at 30.34% is 1.06pp under pzstd-3 — the widest the Pareto edge has been. Realized ratio matches the probe's quantization-adjusted prediction (~−0.20pp).

Validation

  • Fresh soak after the wire change: release 180 s (19,176 round-trips + 767k mutated + 153k garbage decodes) and debug 45 s — zero panics
  • Blob round-trips verified for pz2 and pz2d
  • New unit tests: FSE mode selection, all-32-symbol normalization stress, per-byte corruption + truncation hardening
  • Full suite via tester agent: 756 + 609 green; clippy -D warnings clean
  • Ledger: clean-slate-codec.md §13

🤖 Generated with Claude Code

Chris Lundquist and others added 2 commits June 10, 2026 15:07
Adds mode 2 to the sequence-code streams: textbook tANS (FSE_LOG=10,
4 KB/lane decode tables) over the 32-symbol code alphabet. Encoder
walks symbols in reverse and re-reverses the emitted bit groups, so
the decoder reads FORWARD with the same LSB whole-byte-refill
discipline as the Huffman lanes — no backward bitstream. Both
candidates are built and the byte-smaller ships (CONST/HUFF stay).

Blob ratio 31.04% -> 30.86% (-0.18pp), as the #153 probe predicted.
Decode-speed gate measurement next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…parity at -0.18pp

- CodeLane::Fse caches table[state]: each next() issues the FOLLOWING
  symbol's table load at its end, overlapping the load latency with the
  splice copies. This bought back the prototype's +2.4% ST regression
  (ST now 137.0 ms vs master 137.9 in the same hyperfine run).
- Gate results (same-session hyperfine pairs, M5 Max, silesia blob):
  pz2 31.044% -> 30.863% (-0.181pp), pz2d 30.535% -> 30.341% (-0.194pp);
  decode ST neutral-to-faster, MT +2-3% (~0.4 ms, E-core side, under the
  kill line), pz2d decode neutral, encode +1.9%.
- Soaked fresh after the wire change: release 180s (19,176 round-trips,
  767k mutated, 153k garbage decodes) + debug 45s, zero panics.
- Ledger: clean-slate-codec.md §13; CLAUDE.md rows updated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ChrisLundquist ChrisLundquist merged commit 23ef01a into master Jun 10, 2026
4 checks passed
@ChrisLundquist ChrisLundquist deleted the claude/pz2-fse-seq-lanes branch June 10, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant