feat(pz2d): ship the dict tier (-p pz2d) — blob 30.5%, best LZ ratio in pz by ChrisLundquist · Pull Request #148 · ChrisLundquist/libpz

ChrisLundquist · 2026-06-10T19:05:37Z

Summary

Ships the Pz2d dict tier end-to-end: -p pz2d (id 14), one container block = one 32 MiB segment whose first 16 MiB doubles as the shared dictionary for the segment's remaining 2 MiB blocks. Zero changes to the container/scheduler machinery.

Pieces

pz2::encode_segment / decode_segment — dict region encoded as ONE parse split at block boundaries (split match halves keep their offsets — still valid against earlier region content); dicted blocks via the frozen finder (feat(pz2): frozen shared match-finder — dict-tier encode at 3x spike speed, same ratio #147) with arena reuse. Blob encode for the 16 MiB tier: 70.5 → 38.6 s ST vs the frozen probe.
Container — inner framing ([num_inner][orig,comp]*N + wires) inside the segment payload; 2-wave decode: dict region is a sequential prefix chain, the remaining blocks fan out across scoped threads against the completed region.
CLI (-p pz2d, -l, --list-pipelines), parse-strategy mapping identical to Pz2 (auto-greedy gate at segment granularity), all four cross-pipeline test matrices + a dedicated segment round-trip test (boundary-crossing matches, unaligned blocks, dict > segment, degenerate tiny blocks).

Measured (blob, CLI e2e) — the honest ledger (§11b)

	ratio	decode wall	encode wall
pz pz2	31.04%	16.8 ms	3.15 s
pz pz2d v1	30.48%	42.5 ms	23.7 s
pzstd -3 -p18	31.40%	22.9 ms	—

Best LZ-family ratio in pz, 0.9pp under pzstd-3, still 3.3× faster decode than zstd ST — an opt-in max-ratio tier. Two known v1 bottlenecks, both measured and documented with fixes:

Decode is memory-traffic-bound (saturates at 4 threads): wave-2 still pays naive per-block dict priming (~3.3 GB zero+memcpy) — the exact failure mode §11's arithmetic predicted. The arena-decode fix projects ~17–20 ms but touches the proven unsafe splice, so it gets its own session + fresh soak.
Concurrent encode inflation 2.9× vs the ST probe (16 MiB dict + 64 MB frozen prev walks across 7 segments) — same cache physics as the 4 MiB block rejection; tuning levers listed.

Test plan

202 MB blob CLI round-trip verified; ratio matches the probe exactly (30.475%)
742 + 595 tests, fmt, clippy clean
Segment codec unit test covers match-splitting and degenerate geometries

🤖 Generated with Claude Code

…in pz Pz2d (Pipeline id 14): one container block = one 32 MiB segment whose first 16 MiB doubles as a shared dictionary for the segment's remaining 2 MiB blocks. Rides the existing container/scheduler unchanged. Library: pz2::encode_segment / decode_segment — the dict region is ONE parse split at block boundaries (split match halves keep their offsets, still valid against earlier region content; sub-MIN_MATCH remnants become literals), dicted blocks use the frozen finder with an arena reused across blocks. Took the 16 MiB-dict blob encode 70.5 -> 38.6 s ST vs the frozen probe (no partial-dict rebuilds, region parsed once). Container: inner framing [num_inner][orig,comp]*N + wires inside the segment payload; 2-wave decode in decompress_block_pz2d (dict region = sequential prefix chain, remaining blocks fan out across scoped threads against the completed region). Auto/greedy/lazy parse mapping matches Pz2 (auto gate applied at segment granularity). CLI: -p pz2d, -l, list. Measured (blob, CLI e2e) and documented honestly in section 11b: - ratio 30.475% — best LZ-family ratio in pz, 0.92pp under pzstd-3, 0.56pp under pz2; round-trip verified - decode 42.5 ms v1 — memory-traffic-bound (saturates at 4 threads) by naive per-block dict priming (~3.3 GB of zero+memcpy), exactly the failure mode section 11's arithmetic predicted; arena decode projected ~17-20 ms, deferred because it touches the proven unsafe splice - encode 23.7 s — concurrent segment encodes inflate the ST probe 2.9x (16 MiB dict + 64 MB frozen prev walks; the 4 MiB-block cache lesson at segment scale); levers documented Pz2d added to all four cross-pipeline test matrices + segment round-trip unit test (boundary-crossing matches, unaligned blocks, dict > segment, degenerate tiny blocks). 742 + 595 tests, fmt, clippy clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ChrisLundquist merged commit 698193a into master Jun 10, 2026

ChrisLundquist deleted the claude/pz2d-container branch June 10, 2026 19:05

ChrisLundquist mentioned this pull request Jun 10, 2026

docs: GPU-path research report + num-G kill and LDM census evidence #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pz2d): ship the dict tier (-p pz2d) — blob 30.5%, best LZ ratio in pz#148

feat(pz2d): ship the dict tier (-p pz2d) — blob 30.5%, best LZ ratio in pz#148
ChrisLundquist merged 1 commit into
masterfrom
claude/pz2d-container

ChrisLundquist commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Jun 10, 2026

Summary

Pieces

Measured (blob, CLI e2e) — the honest ledger (§11b)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant