Skip to content

gpu: snapshot per-batch LOD timing across batch boundary#3

Closed
nclack wants to merge 1 commit into
mainfrom
fix-154-lod-timing-snapshot
Closed

gpu: snapshot per-batch LOD timing across batch boundary#3
nclack wants to merge 1 commit into
mainfrom
fix-154-lod-timing-snapshot

Conversation

@nclack

@nclack nclack commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Closes acquire-project#154.

record_flush_metrics reads the per-LOD timing events lod_shared->timing[fc] for the batch being drained, but the producer re-records those events for the next batch on the same fc. Drains are lazy and run on the delivery worker, so the read can race the re-record and pick up a different batch's generation. CUDA event APIs are thread-safe and accumulate_metric_cu discards bad readings, so the only effect is occasional skew in reported per-LOD timing — never wrong output bytes.

A naive copy of the event handle into the handoff (as aggregate timing does) is insufficient: timing[fc].t_end is dual-purpose — it is also the GPU_EDGE_LOD_DONE ordering edge — and is re-recorded during the next batch's fill, before any drain ordering.

Fix:

  • Give the per-LOD timing 3 generations (LOD_TIMING_SLOTS); worst case is 3 simultaneously-live batches (draining + pending + filling). Each batch owns one generation for its whole lifetime, threaded through the schedule slot and handoff so the drain reads the generation it filled. Reuse (batch N+3 reuses N) is safe because N is joined before N+3's fill begins.
  • Decouple the ordering edge from the timing event: a stable per-fc lod_done[2] event now backs GPU_EDGE_LOD_DONE, seeded exactly where t_end was, so the compress-stream wait fires at the identical pipeline position.

Metrics-quality only; no data-correctness impact. No new unit test (a timing-skew race isn't reliably testable); existing multiscale/LOD ctests confirm metrics still populate.

@nclack

nclack commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

Superseded by acquire-project#160 (PR retargeted to upstream).

@nclack nclack closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-LOD timing metrics can race the next batch's events (skew only, no data impact)

1 participant