Skip to content

Add AVHuff decompressor for laserdisc CHDs#154

Open
rtissera wants to merge 3 commits into
masterfrom
feat/avhuff-codec
Open

Add AVHuff decompressor for laserdisc CHDs#154
rtissera wants to merge 3 commits into
masterfrom
feat/avhuff-codec

Conversation

@rtissera
Copy link
Copy Markdown
Owner

@rtissera rtissera commented Apr 24, 2026

Summary

Ports MAME's A/V Huffman codec to plain C for libchdr, covering both CHDv3/v4 `CHDCOMPRESSION_AV` (type 3) and the CHDv5 `avhu` tag. Decode only — no encoder — matches the existing libchdr scope.

Opens this as draft until byte-identical regression against `chdman extract` on a real laserdisc CHD is confirmed. Appreciated review on the algorithmic correctness in the meantime.

Footprint

  • +516 LOC new C (`src/libchdr_codec_avhuff.c` 464 + `src/codec_avhuff.h` 52)
  • +~4 KiB added to `libchdr.so.0.3` (375 472 B → 379 688 B)
  • Zero new dependencies. Reuses:
    • `src/libchdr_huffman.c` (already Aaron Giles's MAME code, plain C)
    • `src/libchdr_bitstream.c` (MSB-first, MAME-compatible)
    • `src/libchdr_flac.c` (dr_flac wrapper — already solves the MAME 0x2A custom-header trick for the CDFL codec, so FLAC per-channel audio Just Works)

Implemented paths

Region Path
Header 10-byte fixed + `2 * channels` per-channel sizes (metasize / channels / samples / width / height / treesize)
Metadata raw copy
Audio — `treesize == 0xffff` FLAC per channel via `flac_decoder_reset(48 kHz, mono, samples, buf, size)` + `decode_interleaved` with `swap_endian` derived from host endianness
Audio — `treesize != 0 && != 0xffff` dual hi/lo RLE huffman trees (imported once), delta-accumulated s16be per sample
Audio — `treesize == 0` raw big-endian s16 deltas
Video lossless YUY2: 3 delta-RLE huffman trees (Y, Cb, Cr), decoded as `Cb Y Cr Y` per 2-pixel column, with `flush_rle` between rows
Video lossy rejected with `CHDERR_DECOMPRESSION_ERROR` — no real-world LD CHD uses it; MAME also treats it as a stub

Destination `chav` header (12 B) + metadata + audio (per-channel s16be sequential) + video (YUY2 big-endian) matches MAME's `raw_data_size` layout. Trailing space padded to zero to match `hunkbytes`.

Integration points (`src/libchdr_chd.c`)

  1. `codec_data` struct: new `avhuff_codec_data avhuff` slot
  2. `codec_interfaces[]`: two new entries — `CHD_CODEC_AVHUFF` (v5) and `CHDCOMPRESSION_AV` (v3/v4)
  3. v1-4 init/free/decompress: dispatch `CHDCOMPRESSION_AV` to `&codec_data.avhuff`
  4. v5 init/free/decompress: dispatch `CHD_CODEC_AVHUFF` to `&codec_data.avhuff`

Verified

  • Clean build on Linux x86_64
  • All 16 samples in `tests/corpus/seeds/` still parse cleanly (no regression)
  • Byte-identical regression vs `chdman extract` on a real laserdisc CHD
  • Fuzz corpus extended with AVHuff samples

Motivation

Long-standing request in #69 (open since 2022). Unblocks laserdisc consumers (Daphne, Hypseus Singe, MAME LD cores via libchdr). Owner's concern in the issue was footprint — 500 LOC + 4 KiB is well under what adding a new dependency would cost.

Closes #69 once real-CHD regression is green.

Ports MAME's A/V Huffman codec to plain C for libchdr, covering both
the CHDv3/v4 CHDCOMPRESSION_AV type and the CHDv5 'avhu' tag. Decode
only — no encoder — matches the existing libchdr scope.

Footprint: +516 LOC new C, +~4 KiB to libchdr.so.0.3. No new
dependencies: reuses libchdr's existing huffman (src/libchdr_huffman.c),
bitstream (src/libchdr_bitstream.c), and dr_flac wrapper
(src/libchdr_flac.c, already solves MAME's 0x2A custom header trick
for the CDFL codec).

Implemented paths:

- Header parse (metasize, channels, samples, width, height, treesize,
  per-channel sizes)
- Metadata raw copy
- Audio:
  - FLAC per channel (treesize == 0xffff), via flac_decoder_reset
    at 48 kHz mono with MAME's stripped header
  - Huffman-coded delta samples with dual hi/lo RLE trees
    (treesize != 0 && != 0xffff)
  - Raw big-endian s16 deltas (treesize == 0)
- Video: lossless YUY2 (3 delta-RLE trees: Y, Cb, Cr; interleaved as
  Cb Y Cr Y per 2-pixel column). Lossy path rejected with
  CHDERR_DECOMPRESSION_ERROR — no real-world laserdisc CHD uses it.

Destination 'chav' header (12 bytes) + metadata + audio (per-channel
s16be sequential) + video (YUY2 big-endian) matches MAME's
raw_data_size layout. Trailing space in the hunk is zero-padded.

Wires through src/libchdr_chd.c in four places: struct codec_data
slot, codec_interfaces table (two entries — CHD_CODEC_AVHUFF for v5,
CHDCOMPRESSION_AV for v3/v4), init/free/decompress dispatch for both
the v1-4 path and the v5 path.

All 16 samples in the existing fuzz corpus still parse cleanly
(regression check passed). A real laserdisc CHD is still needed for
byte-identical regression against chdman extract; tracked as a
follow-up.

Closes #69.
Two changes to the AVHuff decoder:

1. The swap_endian flag passed to flac_decoder_decode_interleaved was
   inverted. CHD raw hunks are big-endian, drflac writes native order;
   on a little-endian host (x86_64, aarch64 Linux) the output was thus
   left in LE form, mismatching the per-hunk CRC16 stored in the CHDv5
   map. Every FLAC-audio AVHuff hunk failed CHDERR_DECOMPRESSION_ERROR.
   detect_native_endian() returns 1 on LE / 0 on BE, which is exactly
   the swap value we need.

2. audiohi/audiolo huffman decoders are only used by the legacy huffman
   audio sub-codec (treesize != 0 and != 0xffff). Modern chdman emits
   FLAC audio almost exclusively, so allocating those 256 KiB up front
   wastes memory on the common case. Allocate them lazily in
   decode_audio() on first huffman-audio hunk; reused thereafter.
tests/avhuff_regression.c: a small harness that opens a CHD via libchdr,
walks every hunk via chd_read, and relies on the built-in CRC16 verification
(VERIFY_BLOCK_CRC=1) to catch any byte-level decode error. Exits non-zero
on any failure.

tests/avhuff_corpus/: fetch.sh pulls the four redistributable AVHuff CHDs
used during the AVHuff bring-up:

* MAME's createld_avi_yuv2_3_frames_no_audio / createld_avi_uyvy_3_frames_no_audio
  regtest output (BSD-3-Clause, 376 B each, video-only, 6 hunks of 219660 B)
* Synthesized variants via chdman createld with a 440 Hz sine track muxed in,
  exercising both -c avhu (auto-FLAC) and -c flac,avhu (dual-codec) paths

The CHD/AVI files themselves are git-ignored — fetch.sh is idempotent and
needs curl, chdman, and ffmpeg.

This catches the FLAC audio endian bug fixed in the previous commit.
@rtissera rtissera marked this pull request as ready for review April 30, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compression format AVHuff not supported

1 participant