Drain sink on stream destroy + destroy run-out test by nclack · Pull Request #1 · nclack/chucky

nclack · 2026-06-18T02:17:06Z

Closes acquire-project#147.
Closes acquire-project#152.

acquire-project#147 — flush error paths can free footer_buf with sink IO still pending (both backends)

writer_flush drains the sink IO queue only after finalize_shards. If the flush body errors between queueing a footer-referencing write_direct job and reaching the drain (e.g. a truncate/metadata-write failure after finalize_shards has queued footer jobs), it returns early without draining. Stream destroy then frees footer_buf_pool while the IO worker may still be parked before the write — the acquire-project#145 use-after-free / on-disk-corruption class, behind an error precondition.

Fix: drain unconditionally on the destroy path of both backends, before the teardown that frees the footer pool — mirroring how acquire-project#142 made gate release unconditional on destroy.

GPU: shard_sink_drain in tile_stream_gpu_destroy before engine_array_state_destroy.
CPU: shard_sink_drain in tile_stream_cpu_destroy before the shard_state_destroy loop.

New regression test tests/test_flush_error_drain_cpu.c (CPU-only): injects a one-shot failing truncate (new shard_pool_fs_inject_failing_truncate hook) so footer jobs stay queued when destroy runs; under ASAN, no use-after-free is the pass condition.

acquire-project#152 — test_destroy_midstream does not exercise the teardown run-out path

Both existing subtests auto-flush successfully before destroy, so the GPU worker queue is already empty when gpu_delivery_stop_join runs — the run-out line is verified by reasoning only.

New subtest forces a delivery to stay queued at destroy time by marking the pool errored up front (new synchronous shard_pool_fs_set_error hook), so every delivery short-circuits on has_error; then destroys from another thread and asserts no hang and no use-after-free under ASAN.

The two fault-injection helpers (inject_failing_truncate, one-shot/truncate-triggered; set_error, synchronous) are distinct and serve the two tests; both land here together.

nclack · 2026-06-18T02:25:39Z

Superseded by acquire-project#158 (PR retargeted to upstream).

Nathan Clack added 2 commits June 18, 2026 02:15

Drain sink in stream destroy (acquire-project#147)

c4117fa

test: destroy run-out path acquire-project#152

be7bb10

nclack closed this Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain sink on stream destroy + destroy run-out test#1

Drain sink on stream destroy + destroy run-out test#1
nclack wants to merge 2 commits into
mainfrom
fix-147-152-teardown

nclack commented Jun 18, 2026

Uh oh!

nclack commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nclack commented Jun 18, 2026

acquire-project#147 — flush error paths can free footer_buf with sink IO still pending (both backends)

acquire-project#152 — test_destroy_midstream does not exercise the teardown run-out path

Uh oh!

nclack commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant