Skip to content

fix(harness): fix zero file sizes in sandbox ls and glob results#1509

Open
wangdy wants to merge 1 commit into
agentscope-ai:mainfrom
wangdy:fix/sandbox-ls-glob-file-size
Open

fix(harness): fix zero file sizes in sandbox ls and glob results#1509
wangdy wants to merge 1 commit into
agentscope-ai:mainfrom
wangdy:fix/sandbox-ls-glob-file-size

Conversation

@wangdy
Copy link
Copy Markdown

@wangdy wangdy commented May 27, 2026

Problem

Both ls() and glob() in BaseSandboxFilesystem were constructing
FileInfo entries with size 0, so sandbox-backed file listings always
reported 0 bytes regardless of the actual file size.

Fix

  • update ls() to emit tab-separated entries with real file sizes
  • update glob() to return "<size>\t<path>" records so FileInfo.size
    is populated correctly
  • use GNU file utilities available in the Debian-based sandbox:
    stat -c%s for ls() and find -printf for glob()
  • add regression tests covering file size parsing for both code paths

Testing

mvn -pl agentscope-harness -Dtest=BaseSandboxFilesystemTest test

Both ls() and glob() in BaseSandboxFilesystem were constructing
FileInfo entries with size 0, so sandbox-backed file listings always
reported 0 bytes regardless of the actual file size.

Fix ls() to emit tab-separated entries with real file sizes, and fix
glob() to return "<size>\t<path>" records so FileInfo.size is populated
correctly for matched files.

Use GNU file utilities available in the Debian-based sandbox:
stat -c%s for ls() and find -printf for glob().

Also add regression tests covering file size parsing for both paths.
@wangdy wangdy requested a review from a team May 27, 2026 12:30
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...gent/filesystem/sandbox/BaseSandboxFilesystem.java 54.54% 2 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@AgentScopeJavaBot AgentScopeJavaBot added bug Something isn't working area/harness agentscope-harness (test/runtime support) labels May 28, 2026
Copy link
Copy Markdown
Collaborator

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

This PR fixes a real bug: ls() and glob() in BaseSandboxFilesystem were always reporting size=0 for files because the constructed shell commands did not emit size information. The fix switches to tab-separated output with stat -c%s (for ls) and find -printf '%s\t%p\n' (for glob), and adds focused unit tests that decouple parsing from sandbox execution. The fix is correct and the tests are well-structured. Two improvements are worth considering before merge: (1) the ls() implementation now forks one stat subprocess per file, which is O(N) forks per listing and noticeably slower than glob()'s single-pass find -printf; consider unifying ls() to use find -mindepth 1 -maxdepth 1 -printf '%y\t%p\t%s\n' for consistency and performance. (2) The ls() parser uses split("\t", 3), which is fragile if a filename contains a literal tab character — glob() uses split("\t", 2) and is robust to that. Otherwise the change is solid.

Comment on lines 78 to +84
String cmd =
"for f in "
+ escapedPath
+ "/*; do "
+ " if [ -d \"$f\" ]; then echo \"DIR:$f\"; "
+ " elif [ -f \"$f\" ]; then echo \"FILE:$f\"; fi; "
+ "done 2>/dev/null";
+ "/*; do if [ -d \"$f\" ]; then printf 'DIR\\t%s\\n"
+ "' \"$f\"; elif [ -f \"$f\" ]; then printf 'FILE\\t%s\\t%s\\n"
+ "' \"$f\" \"$(stat -c%s \"$f\" 2>/dev/null || echo 0)\"; fi; done"
+ " 2>/dev/null";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Performance/consistency: this for f in .../*; do ... stat -c%s "$f" ...; done shells out to stat once per file, so listing a directory with N entries spawns N subprocesses. The glob() path below already uses a single find -printf invocation which is both faster and consistent. Consider rewriting ls() to use the same one-pass approach, e.g.

find <path> -mindepth 1 -maxdepth 1 -printf '%y\t%p\t%s\n' 2>/dev/null

where %y yields d/f/l, then map d->DIR and f->FILE in Java. This also removes the awkward two-line concatenation of the printf format string and aligns with the comment in glob() about single-pass O(1)-per-file metadata reads.

entries.add(FileInfo.ofDir(line.substring(4), ""));
} else if (line.startsWith("FILE:")) {
entries.add(FileInfo.ofFile(line.substring(5), 0, ""));
String[] parts = line.split("\t", 3);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Robustness: line.split("\t", 3) is fragile if a filename contains a literal tab character. With a tab in the path, parts[1] will be the prefix of the path and parts[2] will be <rest-of-path>\t<size>, which fails Long.parseLong and silently yields size=0 (the very bug this PR is fixing). The glob() parser uses split("\t", 2) which is tab-safe because the path is the trailing field. If you adopt the find -printf suggestion above and put %s before %p ('%y\t%s\t%p\n'), you can keep split("\t", 3) safely with the path as the trailing field.

String[] parts = line.split("\t", 2);
if (parts.length == 2) {
// find -printf '%s\t%p\n': parts[0]=size, parts[1]=path
entries.add(FileInfo.ofFile(parts[1].trim(), parseFileSize(parts[0]), ""));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] parts[1].trim() strips legitimate leading/trailing whitespace from filenames (whitespace is valid in POSIX paths). Since the field is delimited by tab and find -printf '%p' does not pad, the trim is unnecessary and can corrupt paths. Also note the inconsistency: ls() does not trim its path field. Suggest dropping .trim() here.

Comment on lines +359 to +363
private static long parseFileSize(String raw) {
try {
return Long.parseLong(raw.trim());
} catch (NumberFormatException e) {
return 0L;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] parseFileSize silently swallows NumberFormatException and returns 0, which is the same value used to indicate "size unknown". If the sandbox ever returns an unexpected format, callers will see size=0 with no signal. Consider logging at WARN/DEBUG so this failure mode is diagnosable in production. Non-blocking.

Comment on lines +31 to +67
@Test
void ls_parsesFileSizesFromSandboxOutput() {
TestSandboxFilesystem filesystem = new TestSandboxFilesystem();
filesystem.nextExecuteResponse =
new ExecuteResponse(
"DIR\t/workspace/session/uploads/subdir\n"
+ "FILE\t/workspace/session/uploads/a.pdf\t1234\n",
0,
false);

var result = filesystem.ls(RuntimeContext.empty(), "/workspace/session/uploads");

assertFalse(result.entries().isEmpty());
assertEquals(2, result.entries().size());
assertEquals("/workspace/session/uploads/subdir", result.entries().get(0).path());
assertEquals("/workspace/session/uploads/a.pdf", result.entries().get(1).path());
assertEquals(1234L, result.entries().get(1).size());
}

@Test
void glob_parsesFileSizesFromSandboxOutput() {
TestSandboxFilesystem filesystem = new TestSandboxFilesystem();
filesystem.nextExecuteResponse =
new ExecuteResponse(
"128\t/workspace/session/uploads/a.pdf\n"
+ "2048\t/workspace/session/uploads/b.xlsx\n",
0,
false);

var result = filesystem.glob(RuntimeContext.empty(), "*.pdf", "/workspace/session/uploads");

assertEquals(2, result.matches().size());
assertEquals("/workspace/session/uploads/a.pdf", result.matches().get(0).path());
assertEquals(128L, result.matches().get(0).size());
assertEquals("/workspace/session/uploads/b.xlsx", result.matches().get(1).path());
assertEquals(2048L, result.matches().get(1).size());
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[praise] Nice test design: subclassing BaseSandboxFilesystem with a stub execute cleanly isolates the parsing logic from real sandbox I/O. This pattern keeps the regression coverage fast and deterministic and is worth reusing for future sandbox-shell parsing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/harness agentscope-harness (test/runtime support) bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants