Skip to content

Fixes zombie process leak in execFile#39

Open
donatj wants to merge 10 commits into
masterfrom
fix/zombie-cleanup
Open

Fixes zombie process leak in execFile#39
donatj wants to merge 10 commits into
masterfrom
fix/zombie-cleanup

Conversation

@donatj
Copy link
Copy Markdown
Owner

@donatj donatj commented Apr 24, 2026

  • Defers cmd.Wait() immediately after cmd.Start() to ensure process reaping
  • Combines Wait error with any stdin operation errors using multierror
  • Moves timeout timer setup before stdin operations for proper cleanup

Previously early returns from Seek or Copy failures would skip cmd.Wait(), leaving zombie processes.

- Defers cmd.Wait() immediately after cmd.Start() to ensure process reaping
- Combines Wait error with any stdin operation errors using multierror
- Moves timeout timer setup before stdin operations for proper cleanup

Previously early returns from Seek or Copy failures would skip cmd.Wait(), leaving zombie processes.
@donatj donatj marked this pull request as ready for review April 24, 2026 20:55
Copilot AI review requested due to automatic review settings April 24, 2026 20:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure hook subprocesses started by HookExec.execFile are always reaped (avoiding zombie processes) even when stdin preparation/writes fail, while also improving timeout cleanup and error reporting.

Changes:

  • Switches execFile to a named return so cmd.Wait() can be deferred and its error combined with earlier failures.
  • Moves the timeout timer setup earlier in execFile to cover stdin operations and ensure cleanup on all exit paths.
  • Uses multierror to combine cmd.Wait() errors with stdin operation errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread exec.go Outdated
Comment thread exec.go Outdated
Comment thread exec.go Outdated
- Uses exec.CommandContext with context.WithTimeout for process lifecycle
- Moves data.Seek before cmd.Start to avoid starting process on seek failure
- Removes deferred stdin.Close in favor of explicit cleanup on error paths
- Adds timeout exceeded detection with descriptive error message
- Maintains multierror combination of stdin and wait errors

Context-based cancellation is more idiomatic and ensures proper process cleanup on timeout.
Comment thread exec.go Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread exec.go Outdated
defer func() {
waitErr := cmd.Wait()

if ctx.Err() == context.DeadlineExceeded {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout wrapping uses ctx.Err() == context.DeadlineExceeded unconditionally. In edge races where the deadline fires but the process exits cleanly (so cmd.Wait() returns nil), this would turn a successful run into an error (and also wraps a nil error). Consider only applying the timeout wrapper when waitErr != nil (or when the process was actually killed due to the context), e.g., gate on ctx.Err() == context.DeadlineExceeded && waitErr != nil / errors.Is(waitErr, context.DeadlineExceeded) / an ExitError signal check.

Suggested change
if ctx.Err() == context.DeadlineExceeded {
if waitErr != nil && ctx.Err() == context.DeadlineExceeded {

Copilot uses AI. Check for mistakes.
Comment thread exec.go
…/copy-error paths

Agent-Logs-Url: https://github.com/donatj/hookah/sessions/9c696600-1506-4502-9912-17bc6b4bb88d

Co-authored-by: donatj <133747+donatj@users.noreply.github.com>
donatj added 3 commits April 24, 2026 16:28
multierror.Append already handles nil cases
- Sets Setpgid to create process group for spawned scripts
- Monitors context deadline and explicitly kills process group on timeout
- Uses negative PID with SIGKILL to terminate all children

Shell scripts spawn child processes that survive parent termination. CommandContext kills only the parent shell,
leaving children like sleep running. Process groups allow killing the entire tree.
- Overrides cmd.Cancel to kill process group on context timeout
- Removes manual select/goroutine timeout monitoring
- Returns to simple cmd.Wait in defer

CommandContext automatically calls cmd.Cancel on timeout. Overriding it to kill the process group is cleaner
than manually monitoring ctx.Done.
Comment thread exec.go
}
defer cancel()

cmd := exec.CommandContext(ctx, f)
donatj added 4 commits April 24, 2026 17:04
- Guards cmd.Cancel against nil process
- Ignores stdin.Close error after successful copy

Child processes may exit early without reading stdin, causing close to fail with a broken pipe error.
- Switches from multierror.Append().ErrorOrNil() to native errors.Join
- Completes migration to stdlib error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants