probe(pz2): tANS entropy accounting — per-block tANS passes (−0.25pp), global tables dead#153
Merged
Merged
Conversation
…, global tables and order-1 are DEAD Answers the dict-tier-era question "would tANS beat the 8-lane Huffman if we had global state tables?" with bit-exact histogram pricing (examples/pz2_entropy_probe.rs + two doc(hidden) hooks exposing the exact lane streams and the shipped package-merge lengths). Scenarios: shipped per-block Huffman (A), per-block tANS Shannon ideal with A's own headers (B), segment-global tables (C, C' = per-block choice), and order-1 segment-global on the seq-code lanes (D). Blob verdict: B -0.255pp (gate-passing), C +0.894pp (WORSE — 2 MiB per-block histograms are already saturated and heterogeneous segments poison shared tables, the global-head-dict lesson again), C' -0.066pp, D -0.055pp. Per file B ranges -0.13 (xml) to -0.55 (dickens). Lane split: ll -0.114 / lit -0.059 / ml -0.054 / of -0.027 — the win is the Huffman 1-bit floor on the small-alphabet sequence-code lanes (greedy parses make lit_run=0 dominate far past p=0.5); the 8-lane Huffman literals are near-optimal. So the hypothesis inverted: tANS yes, global tables no. The live follow-up is per-block tANS on the three seq-code lanes (~-0.20pp realized after table quantization), gated on a decode-speed-neutral prototype of the fused splice with FSE states. Ledger: clean-slate-codec.md §12; CLAUDE.md dead-end entry added. Suite: 745 + 598 green, fmt/clippy clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Task #13: the entropy-accounting probe answering "would tANS or another entropy coder beat the 8-lane Huffman if we had global state tables?" Pure histogram math — no codec or wire changes.
examples/pz2_entropy_probe.rsprices four scenarios bit-exactly on the exact lane streams the encoder feeds its Huffman lanes (via two#[doc(hidden)]hooks:pz2::probe_lane_streams,pz2::probe_huffman_lengths), mirroring shipped headers and CONST/RAW fallbacks.Results (Δpp of input vs shipped; 2 MiB blocks, greedy, 32 MiB segments)
Verdicts
lit_run = 0far past p = 0.5); the 8-lane Huffman literals are near-optimal (huff0's classic result reconfirmed).Follow-up surfaced (not in this PR)
Per-block tANS/FSE on the three seq-code lanes, ~−0.20pp realized after table quantization — gated on a decode-speed-neutral prototype of the fused splice with interleaved FSE states (zstd's exact design is the existence proof, but pz2's 12 GiB/s all-cores / 1.4 GB/s ST must hold).
Ledger: clean-slate-codec.md §12; CLAUDE.md dead-end entry added. Suite: 745 + 598 green, fmt/clippy clean.
🤖 Generated with Claude Code