Fixes zombie process leak in execFile#39
Conversation
- Defers cmd.Wait() immediately after cmd.Start() to ensure process reaping - Combines Wait error with any stdin operation errors using multierror - Moves timeout timer setup before stdin operations for proper cleanup Previously early returns from Seek or Copy failures would skip cmd.Wait(), leaving zombie processes.
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure hook subprocesses started by HookExec.execFile are always reaped (avoiding zombie processes) even when stdin preparation/writes fail, while also improving timeout cleanup and error reporting.
Changes:
- Switches
execFileto a named return socmd.Wait()can be deferred and its error combined with earlier failures. - Moves the timeout timer setup earlier in
execFileto cover stdin operations and ensure cleanup on all exit paths. - Uses
multierrorto combinecmd.Wait()errors with stdin operation errors.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Uses exec.CommandContext with context.WithTimeout for process lifecycle - Moves data.Seek before cmd.Start to avoid starting process on seek failure - Removes deferred stdin.Close in favor of explicit cleanup on error paths - Adds timeout exceeded detection with descriptive error message - Maintains multierror combination of stdin and wait errors Context-based cancellation is more idiomatic and ensures proper process cleanup on timeout.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defer func() { | ||
| waitErr := cmd.Wait() | ||
|
|
||
| if ctx.Err() == context.DeadlineExceeded { |
There was a problem hiding this comment.
The timeout wrapping uses ctx.Err() == context.DeadlineExceeded unconditionally. In edge races where the deadline fires but the process exits cleanly (so cmd.Wait() returns nil), this would turn a successful run into an error (and also wraps a nil error). Consider only applying the timeout wrapper when waitErr != nil (or when the process was actually killed due to the context), e.g., gate on ctx.Err() == context.DeadlineExceeded && waitErr != nil / errors.Is(waitErr, context.DeadlineExceeded) / an ExitError signal check.
| if ctx.Err() == context.DeadlineExceeded { | |
| if waitErr != nil && ctx.Err() == context.DeadlineExceeded { |
…/copy-error paths Agent-Logs-Url: https://github.com/donatj/hookah/sessions/9c696600-1506-4502-9912-17bc6b4bb88d Co-authored-by: donatj <133747+donatj@users.noreply.github.com>
multierror.Append already handles nil cases
- Sets Setpgid to create process group for spawned scripts - Monitors context deadline and explicitly kills process group on timeout - Uses negative PID with SIGKILL to terminate all children Shell scripts spawn child processes that survive parent termination. CommandContext kills only the parent shell, leaving children like sleep running. Process groups allow killing the entire tree.
- Overrides cmd.Cancel to kill process group on context timeout - Removes manual select/goroutine timeout monitoring - Returns to simple cmd.Wait in defer CommandContext automatically calls cmd.Cancel on timeout. Overriding it to kill the process group is cleaner than manually monitoring ctx.Done.
| } | ||
| defer cancel() | ||
|
|
||
| cmd := exec.CommandContext(ctx, f) |
- Guards cmd.Cancel against nil process - Ignores stdin.Close error after successful copy Child processes may exit early without reading stdin, causing close to fail with a broken pipe error.
- Switches from multierror.Append().ErrorOrNil() to native errors.Join - Completes migration to stdlib error handling
Previously early returns from Seek or Copy failures would skip cmd.Wait(), leaving zombie processes.