feat(batch): first-class e8p codebook in batch_quantization by cnygaard · Pull Request #17 · cnygaard/glq

cnygaard · 2026-06-22T21:53:04Z

Summary

Makes the e8p codebook a first-class option in scripts/batch_quantization.py. Previously the tool only forwarded --codebook-size (a shell knob), so e8p was reachable only through the raw extra_args escape hatch.

Changes

QuantJob.codebook field — "e8_shell" (default) | "e8_relaxed" | "e8p"; forwarded as --codebook in _shared_flags only when non-default (existing shell commands stay byte-identical).
Validation (__post_init__): e8p is the fixed-grid RVQ recipe → uniform integer bpw 2/3/4 only — rejects mixed precision and codebook_size, and rejects unknown codebook names, all with clear messages.
output_name tags non-default codebooks: …-GLQ-<n>bpw-e8p (and -relaxed).
Docstring + JOBS example updated with an e8p entry.

Tests

7 new unit tests in tests/test_batch_progress.py (flag forwarding, name tagging, the four validation guards, default-omits-flag). 21/21 pass.
Dry-run verified: a gemma-4-12B-it @ 3bpw e8p job builds glq-quantize … --bpw 3 --codebook e8p --streaming → gemma-4-12B-it-GLQ-3bpw-e8p.

Follows the 0.6.5 e8p kernel work (PR #15). The smoke worker already gates limit_mm_per_prompt on the checkpoint architecture, so e8p (text-only or multimodal) smoke-tests unchanged.

The batch quantizer only forwarded --codebook-size (a shell knob); e8p was reachable only via the raw extra_args escape hatch. Make it first-class: - QuantJob.codebook ("e8_shell" default | "e8_relaxed" | "e8p") -> forwarded as --codebook in _shared_flags (omitted when default, so existing shell commands are byte-identical). - Validation: e8p is the fixed-grid RVQ recipe -> uniform integer bpw 2/3/4 only (rejects mixed precision and codebook_size with clear messages); unknown codebook names rejected. - output_name tags non-default codebooks: ...-GLQ-<n>bpw-e8p / -relaxed. - Docstring + JOBS example updated with an e8p entry. - 7 unit tests (flag forwarding, name tag, the four validation guards, default-omits-flag); 21/21 pass. Verified by dry-run: a gemma-4-12B-it@3bpw e8p job builds `glq-quantize ... --bpw 3 --codebook e8p --streaming` and outputs to gemma-4-12B-it-GLQ-3bpw-e8p.

Block-diagonal E8P: the E8P codebook path no longer forces a full power-of-2 Hadamard. It pads each linear dim to a sum of pow2 blocks (each >= 64 cols / 16 rows) instead of the next power of two, so non-pow2 models stop bloating — gemma-4-31B E8P 3bpw drops from ~27 GB to ~11 GB (qidxs 22.5 -> 10.7 GiB) while the tensor-core decode + RVQ are untouched, still one fused op so FULL cudagraph still pays off. Serves on vLLM 0.23 FULL cudagraph; 1.5-1.9x faster decode than the shell codebook at matched quality. Folds in three fixes surfaced by validation: vLLM cudagraph buffer sizing, an HF-eager / large-batch shared-mem OOB (single-block RHT on a non-pow2 m_pad), and pow2<->block-diag back-compat loading (re-derives dims from the loaded Qidxs_e8p shape, so both legacy pow2 and new block-diag checkpoints load on every path). A GLQ_E8P_POW2 quantize toggle keeps the legacy full-Hadamard E8P available — measured ~1 PPL better at 2bpw, ~0.26 at 3bpw, so block-diag is near-free at >=3bpw. Also includes first-class E8P support in the batch_quantization tool (#17).

cnygaard merged commit d3cb612 into main Jun 22, 2026
3 checks passed

cnygaard mentioned this pull request Jun 23, 2026

feat(e8p): block-diagonal RHT — remove pow2-padding bloat, keep TC decode #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): first-class e8p codebook in batch_quantization#17

feat(batch): first-class e8p codebook in batch_quantization#17
cnygaard merged 1 commit into
mainfrom
feat/batch-quant-e8p

cnygaard commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cnygaard commented Jun 22, 2026

Summary

Changes

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant