Skip to content

fix race condition in batch completer stats#1208

Open
bgentry wants to merge 2 commits intomasterfrom
bg/batch-error-test-race-fix
Open

fix race condition in batch completer stats#1208
bgentry wants to merge 2 commits intomasterfrom
bg/batch-error-test-race-fix

Conversation

@bgentry
Copy link
Copy Markdown
Contributor

@bgentry bgentry commented Apr 13, 2026

Batch completion could race when multiple job completions reused the
same JobStatistics pointer. Short excerpt from the race detector
failure in a recent CI run:

WARNING: DATA RACE
Write at ... BatchCompleter.handleBatch.func4()
.../internal/jobcompleter/job_completer.go:482
Previous read ... jobStatisticsFromInternal()
.../event.go:75

BatchCompleter queued the caller's stats pointer directly and later
mutated CompleteDuration before publishing updates. Batch job error
paths can reuse one stats object across multiple completions, so a later
batch could mutate stats that were already visible to subscribers.

Take a snapshot of JobStatistics when queuing the completion so each
published update owns an immutable copy. Add regression coverage that
queues two completions with a shared stats pointer and verifies that
each published update has a distinct snapshot.

bgentry added 2 commits April 12, 2026 20:45
Batch completion could race when multiple job completions reused the
same `JobStatistics` pointer. Short excerpt from the race detector
failure in a recent CI run:

    WARNING: DATA RACE
    Write at ... BatchCompleter.handleBatch.func4()
    .../internal/jobcompleter/job_completer.go:482
    Previous read ... jobStatisticsFromInternal()
    .../event.go:75

`BatchCompleter` queued the caller's stats pointer directly and later
mutated `CompleteDuration` before publishing updates. Batch job error
paths can reuse one stats object across multiple completions, so a later
batch could mutate stats that were already visible to subscribers.

Take a snapshot of `JobStatistics` when queuing the completion so each
published update owns an immutable copy. Add regression coverage that
queues two completions with a shared stats pointer and verifies that
each published update has a distinct snapshot.
@bgentry bgentry requested a review from brandur April 13, 2026 01:53
@bgentry bgentry added the bug Something isn't working label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@brandur brandur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yep, makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants