Skip to content

fix(harness): prevent local shell output deadlock#1520

Open
luowanghaoyun wants to merge 1 commit into
agentscope-ai:mainfrom
luowanghaoyun:fix/harness-local-shell-output-deadlock
Open

fix(harness): prevent local shell output deadlock#1520
luowanghaoyun wants to merge 1 commit into
agentscope-ai:mainfrom
luowanghaoyun:fix/harness-local-shell-output-deadlock

Conversation

@luowanghaoyun
Copy link
Copy Markdown
Contributor

@luowanghaoyun luowanghaoyun commented May 28, 2026

Prevent LocalFilesystemWithShell from deadlocking when a shell command writes large stdout/stderr output.

AgentScope-Java Version

1.1.0-SNAPSHOT

Description

LocalFilesystemWithShell.execute previously waited for the child process to exit before reading stdout and stderr. Commands that write more than the OS pipe buffer can block while Java is still in waitFor, causing HarnessAgent shell execution to time out even though the command should finish quickly.

This PR starts asynchronous stdout/stderr stream readers immediately after ProcessBuilder.start(), matching the safer pattern used by ShellCommandTool. It also adds a regression test that executes a command producing large stdout and verifies the process exits successfully without truncation.

Closes #1519.

Tested with:

  • mvn -q -pl agentscope-harness spotless:apply`
  • mvn -q -pl agentscope-harness -am -Dtest=io.agentscope.harness.agent.filesystem.local.LocalFilesystemWithShellTest test`
  • mvn -q -pl agentscope-harness -am spotless:check compile -DskipTests`

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with mvn spotless:apply
  • All tests are passing (mvn test)
  • Javadoc comments are complete and follow project conventions
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>
@luowanghaoyun luowanghaoyun requested a review from a team May 28, 2026 13:12
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 51.42857% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ent/filesystem/local/LocalFilesystemWithShell.java 51.42% 16 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@AgentScopeJavaBot AgentScopeJavaBot added bug Something isn't working area/harness agentscope-harness (test/runtime support) labels May 29, 2026
Copy link
Copy Markdown
Collaborator

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

The PR correctly fixes a real pipe-buffer deadlock in LocalFilesystemWithShell.execute(): when a child process writes more than the OS pipe buffer (~64KB on Linux/macOS), the previous code blocked in waitFor and again in readAllBytes because nothing was draining stdout/stderr. The new flow mirrors ShellCommandTool: a static cached daemon thread pool reads streams asynchronously immediately after pb.start(), futures are cancelled on timeout/error, and the process is force-destroyed in the failure path. The fix is sound, the resource lifecycle is correctly handled in both happy-path and exception paths, and the regression test reliably triggers the deadlock pre-fix (200KB > pipe buffer, so the old waitFor + readAllBytes would have hung). The implementation is consistent with the existing ShellCommandTool pattern. Only minor refinements are suggested.

if (!finished) {
proc.destroyForcibly();
cancelStreamReader(stdoutFuture);
cancelStreamReader(stderrFuture);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Partial output is discarded on timeout. After proc.destroyForcibly() the OS closes the pipe ends, so the reader Futures will normally complete within milliseconds. Consider draining them with a short bounded getStreamOutput-style call (as ShellCommandTool#getOutputWithTimeout(stdoutFuture, 1, TimeUnit.SECONDS) does) before cancelling, then prepend the timeout banner. This preserves diagnostic context (the last lines printed before the process got stuck) without reintroducing the deadlock. Non-blocking nit, not required for merge.

}

private static String readStream(InputStream stream) throws IOException {
return new String(stream.readAllBytes(), StandardCharsets.UTF_8);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] readStream calls readAllBytes() without an explicit close. Process pipe streams are released when the process terminates, so this isn't a leak in practice, but ShellCommandTool.StreamReader uses try-with-resources on a BufferedReader, which is more robust against early termination/cancellation paths and consistent with the project's own pattern. Suggest: try (InputStream in = stream) { return new String(in.readAllBytes(), StandardCharsets.UTF_8); }.

private static String getStreamOutput(Future<String> future)
throws IOException, InterruptedException {
try {
return future != null ? future.get(5, TimeUnit.SECONDS) : "";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The 5-second magic number used to bound stream-reader completion deserves a named constant (e.g. STREAM_DRAIN_TIMEOUT_SECONDS) and a brief comment explaining the rationale (drain after waitFor returns true; readers should finish near-instantly because EOF has been reached).

LocalFilesystemWithShell fs =
new LocalFilesystemWithShell(tmp, false, 2, 300_000, null, false);

ExecuteResponse response = fs.execute(RuntimeContext.empty(), "yes x | head -c 200000", 2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Two robustness concerns for this regression test: (1) yes/head -c are POSIX-only -- on Windows CI the test will fail with command-not-found rather than validating the fix. Consider @EnabledOnOs({OS.LINUX, OS.MAC}) or use a Java-implemented stdin generator. (2) The 2-second effectiveTimeout is tight on slow CI runners; flushing 200KB through sh should be fast, but a more generous 10-second timeout would reduce flake risk while still completing in well under that bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/harness agentscope-harness (test/runtime support) bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:LocalFilesystemWithShell can deadlock when shell command writes large stdout/stderr

2 participants