Add AVHuff decompressor for laserdisc CHDs#154
Open
rtissera wants to merge 3 commits into
Open
Conversation
Ports MAME's A/V Huffman codec to plain C for libchdr, covering both
the CHDv3/v4 CHDCOMPRESSION_AV type and the CHDv5 'avhu' tag. Decode
only — no encoder — matches the existing libchdr scope.
Footprint: +516 LOC new C, +~4 KiB to libchdr.so.0.3. No new
dependencies: reuses libchdr's existing huffman (src/libchdr_huffman.c),
bitstream (src/libchdr_bitstream.c), and dr_flac wrapper
(src/libchdr_flac.c, already solves MAME's 0x2A custom header trick
for the CDFL codec).
Implemented paths:
- Header parse (metasize, channels, samples, width, height, treesize,
per-channel sizes)
- Metadata raw copy
- Audio:
- FLAC per channel (treesize == 0xffff), via flac_decoder_reset
at 48 kHz mono with MAME's stripped header
- Huffman-coded delta samples with dual hi/lo RLE trees
(treesize != 0 && != 0xffff)
- Raw big-endian s16 deltas (treesize == 0)
- Video: lossless YUY2 (3 delta-RLE trees: Y, Cb, Cr; interleaved as
Cb Y Cr Y per 2-pixel column). Lossy path rejected with
CHDERR_DECOMPRESSION_ERROR — no real-world laserdisc CHD uses it.
Destination 'chav' header (12 bytes) + metadata + audio (per-channel
s16be sequential) + video (YUY2 big-endian) matches MAME's
raw_data_size layout. Trailing space in the hunk is zero-padded.
Wires through src/libchdr_chd.c in four places: struct codec_data
slot, codec_interfaces table (two entries — CHD_CODEC_AVHUFF for v5,
CHDCOMPRESSION_AV for v3/v4), init/free/decompress dispatch for both
the v1-4 path and the v5 path.
All 16 samples in the existing fuzz corpus still parse cleanly
(regression check passed). A real laserdisc CHD is still needed for
byte-identical regression against chdman extract; tracked as a
follow-up.
Closes #69.
Two changes to the AVHuff decoder: 1. The swap_endian flag passed to flac_decoder_decode_interleaved was inverted. CHD raw hunks are big-endian, drflac writes native order; on a little-endian host (x86_64, aarch64 Linux) the output was thus left in LE form, mismatching the per-hunk CRC16 stored in the CHDv5 map. Every FLAC-audio AVHuff hunk failed CHDERR_DECOMPRESSION_ERROR. detect_native_endian() returns 1 on LE / 0 on BE, which is exactly the swap value we need. 2. audiohi/audiolo huffman decoders are only used by the legacy huffman audio sub-codec (treesize != 0 and != 0xffff). Modern chdman emits FLAC audio almost exclusively, so allocating those 256 KiB up front wastes memory on the common case. Allocate them lazily in decode_audio() on first huffman-audio hunk; reused thereafter.
tests/avhuff_regression.c: a small harness that opens a CHD via libchdr, walks every hunk via chd_read, and relies on the built-in CRC16 verification (VERIFY_BLOCK_CRC=1) to catch any byte-level decode error. Exits non-zero on any failure. tests/avhuff_corpus/: fetch.sh pulls the four redistributable AVHuff CHDs used during the AVHuff bring-up: * MAME's createld_avi_yuv2_3_frames_no_audio / createld_avi_uyvy_3_frames_no_audio regtest output (BSD-3-Clause, 376 B each, video-only, 6 hunks of 219660 B) * Synthesized variants via chdman createld with a 440 Hz sine track muxed in, exercising both -c avhu (auto-FLAC) and -c flac,avhu (dual-codec) paths The CHD/AVI files themselves are git-ignored — fetch.sh is idempotent and needs curl, chdman, and ffmpeg. This catches the FLAC audio endian bug fixed in the previous commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports MAME's A/V Huffman codec to plain C for libchdr, covering both CHDv3/v4 `CHDCOMPRESSION_AV` (type 3) and the CHDv5 `avhu` tag. Decode only — no encoder — matches the existing libchdr scope.
Opens this as draft until byte-identical regression against `chdman extract` on a real laserdisc CHD is confirmed. Appreciated review on the algorithmic correctness in the meantime.
Footprint
Implemented paths
Destination `chav` header (12 B) + metadata + audio (per-channel s16be sequential) + video (YUY2 big-endian) matches MAME's `raw_data_size` layout. Trailing space padded to zero to match `hunkbytes`.
Integration points (`src/libchdr_chd.c`)
Verified
Motivation
Long-standing request in #69 (open since 2022). Unblocks laserdisc consumers (Daphne, Hypseus Singe, MAME LD cores via libchdr). Owner's concern in the issue was footprint — 500 LOC + 4 KiB is well under what adding a new dependency would cost.
Closes #69 once real-CHD regression is green.