docs: GPU-path research report + num-G kill and LDM census evidence#149
Merged
Conversation
ndzip-style entropy-stage candidate for the Num pipeline: 32-value x 8-bitplane transpose + zero-word elimination with a per-group presence bitmap, exact inverse, plus probe_block() instrumentation that runs the shipping front-end (stride sweep + FSE-gated transforms) and counts bytes per plane under both entropy stages. Probe-only; nothing on the wire changes. Result: KILL on all four target files (sao +18.3pp, x-ray +9.7pp, mr +2.1pp, nci +47.1pp vs per-plane FSE). Findings doc follows. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per-plane byte tables for sao/x-ray/mr/nci, oracle analysis (FSE wins every plane on 3/4 files; mr low plane is the lone 0.983x bitpack win), mechanism writeup, and the verdict that the CPU-SIMD next gate is not reached. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…on tarball-class corpora CPU-only census probe (examples/ldm_census.rs): exact 8-byte fingerprints at stride 32, sort, nearest >1MiB same-fp pairing, byte-verified greedy extension, bitmap-deduped coverage. Results (min_match=64): silesia.blob 6.18%, rustup-one.tar 14.15%, rustup-two.tar 56.08% — kill line was <3% on tarball-class. zstd -3 --long=30 confirms realizability: -3.78pp on single-toolchain tar, -16.85pp (2.30x) on dual-toolchain tar vs zstd -3; --long=27's 128MiB window captures <25% of the win. Recommendation: immutable depth-1 front-loaded-dictionary form, gated on heavy-tailed match lengths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…chitecture, not the 256-540 MiB offsets Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates the 2026-06-10 GPU-path research into master. The multi-agent research run (4 repo-history readers, 5 web researchers, 3 adversarial verification lenses per candidate, 37 agents total) concluded that libpz's GPU-entropy dead end measured a specific low-parallelism stream topology rather than GPU entropy coding itself, but that on M5 unified memory the GPU has no bandwidth moat over 18 CPU cores already delivering 12.1 GiB/s pz2 decode — so the encode side is settled (hybrid GPU-match + CPU parse stands), and only one decode path is credible: a GDeflate-shaped evolution of pz2's wire (pz2-G32), with everything else spike-gated or pareto-rejected. This PR adds the final report (
docs/design-docs/gpu-path-research.md), the stage-0 num-bitpack kill evidence (findings doc,numeric::bitpackprobe code,examples/num_bitpack_probe.rs— deterministic, exits 1/KILL: +18.34/+9.67/+2.09/+47.06pp vs per-plane FSE on sao/x-ray/mr/nci), and the LDM census evidence (docs/design-docs/ldm-census-findings.md,examples/ldm_census.rs— PASS: 14.15%/56.08% verified >1 MiB-offset coverage on tarball-class corpora, zstd --long=30 bar at 25.26%/12.99%), plus a reconciliation postscript noting that the pz2d dict tier shipped in #146–#148 implements the census's recommended immutable depth-1 architecture but is segment-scoped (<32 MiB reach), so the census's key finding — duplicate mass at 256–540 MiB offsets — remains uncaptured and the whole-input-scoped front-end is still open.Spike verdicts
claude/spike-num-bitpackclaude/spike-ldm-censusclaude/spike-pz2-g32claude/spike-ibwt-cursorsTest plan
./scripts/test.shgreen (fmt, clippy--all-targetsincl. the two new examples, build, 744 tests; one webgpu perf-comparison test flaked under shared-machine load and passes in isolation)🤖 Generated with Claude Code