feat(pz2): CODES_FSE — per-block tANS seq-code lanes, pz2 30.86% / pz2d 30.34% at decode parity#155
Merged
Merged
Conversation
Adds mode 2 to the sequence-code streams: textbook tANS (FSE_LOG=10, 4 KB/lane decode tables) over the 32-symbol code alphabet. Encoder walks symbols in reverse and re-reverses the emitted bit groups, so the decoder reads FORWARD with the same LSB whole-byte-refill discipline as the Huffman lanes — no backward bitstream. Both candidates are built and the byte-smaller ships (CONST/HUFF stay). Blob ratio 31.04% -> 30.86% (-0.18pp), as the #153 probe predicted. Decode-speed gate measurement next. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…parity at -0.18pp - CodeLane::Fse caches table[state]: each next() issues the FOLLOWING symbol's table load at its end, overlapping the load latency with the splice copies. This bought back the prototype's +2.4% ST regression (ST now 137.0 ms vs master 137.9 in the same hyperfine run). - Gate results (same-session hyperfine pairs, M5 Max, silesia blob): pz2 31.044% -> 30.863% (-0.181pp), pz2d 30.535% -> 30.341% (-0.194pp); decode ST neutral-to-faster, MT +2-3% (~0.4 ms, E-core side, under the kill line), pz2d decode neutral, encode +1.9%. - Soaked fresh after the wire change: release 180s (19,176 round-trips, 767k mutated, 153k garbage decodes) + debug 45s, zero panics. - Ledger: clean-slate-codec.md §13; CLAUDE.md rows updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the surviving candidate from the #153 entropy probe: the three small-alphabet sequence-code lanes (lit-run/offset/match-len) gain a CODES_FSE wire mode — per-block tANS, 32-symbol alphabet, FSE_LOG=10 (4 KB/lane L1-resident tables). Literals stay 8-lane Huffman (probe measured only −0.06pp headroom there).
The encoder builds CONST/Huffman/FSE candidates per stream and ships the byte-smallest, so the wire is bit-exact never-worse and old streams decode unchanged.
Two tricks bought decode parity:
table[state]; eachnext()issues the following symbol's table load at its end, overlapping the latency with the splice copies. Without it: +2.4% ST. With it: neutral-to-faster.Measurements (same-session hyperfine pairs, M5 Max, silesia blob)
pz2 is now 30.86% vs pzstd-3's 31.4% at ~1.3× its parallel decode; pz2d at 30.34% is 1.06pp under pzstd-3 — the widest the Pareto edge has been. Realized ratio matches the probe's quantization-adjusted prediction (~−0.20pp).
Validation
-D warningsclean🤖 Generated with Claude Code