Skip to content

[build images] prefetch build deps into CI images#29209

Open
Jarred-Sumner wants to merge 4 commits intomainfrom
claude/prefetch-build-deps
Open

[build images] prefetch build deps into CI images#29209
Jarred-Sumner wants to merge 4 commits intomainfrom
claude/prefetch-build-deps

Conversation

@Jarred-Sumner
Copy link
Copy Markdown
Collaborator

Adds bun run prefetch and a $BUN_DEPS_CACHE_DIR lookup in downloadWithRetry() so CI images can bake in the ~250MB of dep tarballs / prebuilt WebKit / zig that every build currently re-downloads.

How it works

  • scripts/build/prefetch.ts enumerates every URL the build would fetch and saves each to <out>/<sha256(url)> plus a manifest.json. Run bun run prefetch --print to see the list.
  • downloadWithRetry() checks $BUN_DEPS_CACHE_DIR/<sha256(url)> before the network. Miss → normal download. So a PR that bumps a dep on a stale image still builds — that one dep just downloads as before.

CI image wiring

  • machine.mjs ships scripts/build/ as a tarball next to bootstrap
  • bootstrap.sh / bootstrap.ps1 extract it, run prefetch into /opt/bun-deps / C:\bun-deps, and set BUN_DEPS_CACHE_DIR in the buildkite environment hook (posix) / Machine env (windows)
  • .buildkite/Dockerfile gets the equivalent layer

Bootstrap versions bumped (sh 29→30, ps1 16→17). Not covered: cargo crates for lolhtml (needs the dep's Cargo.lock which doesn't exist pre-extraction).

Add scripts/build/prefetch.ts which downloads every archive the build
fetches (dep tarballs, prebuilt WebKit, zig, node-headers) into a
content-addressed dir keyed by sha256(url). downloadWithRetry() now
checks $BUN_DEPS_CACHE_DIR before the network and falls through on a
miss, so a stale image after a dep bump still builds — that one dep
just downloads normally.

Wired into all three image-build paths:
- machine.mjs ships scripts/build/ as a tarball alongside bootstrap
- bootstrap.sh / bootstrap.ps1 run prefetch into /opt/bun-deps
  (or C:\bun-deps) and export BUN_DEPS_CACHE_DIR via the buildkite
  environment hook (linux/mac) / Machine env (windows)
- .buildkite/Dockerfile gets an equivalent layer

Bootstrap versions bumped (sh 29->30, ps1 16->17).
@robobun
Copy link
Copy Markdown
Collaborator

robobun commented Apr 12, 2026

Updated 12:23 AM PT - Apr 12th, 2026

@Jarred-Sumner, your commit ce53dd9 has 1 failures in Build #45286 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29209

That installs a local version of the PR into your bun-29209 executable, so you can run:

bun-29209 --bun

autofix-ci bot and others added 2 commits April 12, 2026 06:29
- Packer (Windows Azure): upload bun-prefetch.tgz via a file
  provisioner before bootstrap so Prefetch-BuildDeps actually runs.
- Dockerfile-bootstrap.sh: extract the tarball into the fakebun build
  context so the Dockerfile COPY scripts/build layer has something to
  copy (was a hard failure).
@github-actions
Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. feat(ci): bake build dependencies into CI images #26186 - Also bakes build dependencies into CI images to avoid re-downloading ~250MB on every build, using a different implementation approach (shell script + daily GitHub Actions rebuild vs. content-addressed cache)

🤖 Generated with Claude Code

Not every -asan/-baseline tarball is published for every target
(Windows has no -asan). Prefetch now treats download failures as
warnings instead of aborting bootstrap, and downloadWithRetry no
longer retries 4xx responses.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

Walkthrough

Adds a build-dependency prefetch system: a new prefetch script and offline cache, changes to download logic to serve cached artifacts, bootstrap and CI hooks to install/upload a bun-prefetch.tgz, Docker/Packer/provisioner updates to include the tarball, and integration tests for the pipeline (manifest, cache keys, and downloads).

Changes

Cohort / File(s) Summary
Bootstrap & CI integration
.buildkite/Dockerfile, .buildkite/Dockerfile-bootstrap.sh, scripts/bootstrap.ps1, scripts/bootstrap.sh, scripts/machine.mjs, scripts/packer/variables.pkr.hcl, scripts/packer/windows-*.pkr.hcl
Added steps to produce/upload/extract bun-prefetch.tgz and invoke prefetch during bootstrap. Docker build gains a prefetch layer; Packer receives a prefetch_tarball variable and file provisioners; boot scripts populate BUN_DEPS_CACHE_DIR.
Prefetch core & download cache
scripts/build/prefetch.ts, scripts/build/download.ts
New prefetch.ts enumerates dependency URLs, computes sha256 keys, downloads concurrently into a content-addressed cache and emits manifest.json. downloadWithRetry() gains offline fast-path using BUN_DEPS_CACHE_DIR, adds DEPS_CACHE_ENV and offlineCacheKey(), and treats 4xx as permanent failures.
Build helper exports
scripts/build/deps/webkit.ts, scripts/build/zig.ts
Made prebuiltUrl(cfg) and zigDownloadUrl(cfg, safe) exported so external modules (prefetch) can reuse URL construction logic.
Tooling & scripts
package.json
Added prefetch npm script that runs bun scripts/build/prefetch.ts.
Tests
test/integration/build-prefetch/prefetch.test.ts
Added integration tests covering offline cache hits/misses for downloadWithRetry(), prefetch.ts --print enumeration/keys, content-addressed downloads, and generated manifest.json.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[build images] prefetch build deps into CI images' clearly and specifically describes the main objective: adding prefetch functionality to bake build dependencies into CI images.
Description check ✅ Passed The description covers both required sections: it explains what the PR does (prefetch mechanism, downloadWithRetry changes, CI image wiring) and how it was verified (bun run prefetch --print, integration tests, commit messages show testing adjustments).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build/download.ts`:
- Around line 79-85: The cache-copy block that calls copyFile and rename (using
tmpPath and dest) must not abort the whole download on transient FS errors; wrap
the copyFile(tmpPath) and rename(tmpPath, dest) calls in a try/catch inside the
same size !== undefined branch, and on error log a warning including the error
and the paths (tmpPath, cached, dest), attempt to clean up the tmpPath if
present, then continue execution (do not return) so the existing network
fetch/retry logic runs; keep the original early-return only on successful
copy/rename.

In `@scripts/build/prefetch.ts`:
- Around line 160-163: The Linux target ABI can end up undefined on non-Linux
hosts; update the target construction (the object assigned to args.target /
target) so that when target.os === "linux" you use detectLinuxAbi() but fall
back to the concrete string "gnu" if detectLinuxAbi() returns undefined (e.g.,
abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" : undefined); make the same
change in the other target creation site as well (the other block around lines
178-182) so cfg.abi is never left ambiguous.
- Around line 88-93: The config hard-codes zigCommit: ZIG_COMMIT which causes
macOS (darwin) non-CI hosts to cache the wrong Zig; change the assignment to
pick the host-default Zig commit (use ZIG_COMMIT_PARALLEL on non-CI darwin).
Update the object construction to set zigCommit = (process.platform === "darwin"
&& !ci) ? ZIG_COMMIT_PARALLEL : ZIG_COMMIT (or call the existing helper that
returns the default Zig commit), referencing the ZIG_COMMIT and
ZIG_COMMIT_PARALLEL symbols so local bun run prefetch will cache the correct
compiler.

In `@test/integration/build-prefetch/prefetch.test.ts`:
- Line 42: The strict assertion expect(stderr).toBe("") is brittle for spawned
Bun subprocesses; update the three occurrences that assert stderr (the
expect(stderr).toBe("") calls in prefetch.test.ts) to allow only whitespace or
empty output instead of an exact empty string — e.g., replace with a
pattern/assertion like expect(stderr).toMatch(/^\s*$/) (or an equivalent "no
non-whitespace output" check) for the stderr variable so ASAN sanitizer warnings
(whitespace/newlines) won't make the tests flaky.
- Around line 21-25: The test currently writes a zero-byte cache entry for
"http://192.0.2.1/empty" instead of the URL under test, so the "ignore empty
cache entries" behavior is never exercised; update the calls to writeFileSync so
the zero-byte file is created for the same URL variable (url) used by the test
(use writeFileSync(join(cache, sha256(url)), "") ) and move the non-empty
payload to the sibling key (e.g., writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "PAYLOAD")) so sha256, cache, url and
writeFileSync are the correct references to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22c004d4-5c41-495b-a9a0-23c1003caedc

📥 Commits

Reviewing files that changed from the base of the PR and between f96981c and b752ec8.

📒 Files selected for processing (10)
  • .buildkite/Dockerfile
  • package.json
  • scripts/bootstrap.ps1
  • scripts/bootstrap.sh
  • scripts/build/deps/webkit.ts
  • scripts/build/download.ts
  • scripts/build/prefetch.ts
  • scripts/build/zig.ts
  • scripts/machine.mjs
  • test/integration/build-prefetch/prefetch.test.ts

Comment on lines +79 to +85
if (size !== undefined && size > 0) {
console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
const tmpPath = `${dest}.${process.pid}.partial`;
await copyFile(cached, tmpPath);
await rename(tmpPath, dest);
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle cache-copy failures by falling back to network fetch.

If statSync() succeeds but copyFile()/rename() fails (e.g., transient FS lock/permission race), the function throws immediately instead of using the existing network retry path. This makes cache corruption/lock issues fatal when they can be recoverable.

Suggested fix
-    if (size !== undefined && size > 0) {
-      console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
-      const tmpPath = `${dest}.${process.pid}.partial`;
-      await copyFile(cached, tmpPath);
-      await rename(tmpPath, dest);
-      return;
-    }
+    if (size !== undefined && size > 0) {
+      const tmpPath = `${dest}.${process.pid}.partial`;
+      try {
+        console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
+        await copyFile(cached, tmpPath);
+        await rename(tmpPath, dest);
+        return;
+      } catch {
+        await rm(tmpPath, { force: true }).catch(() => {});
+        // Fall through to network path.
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/download.ts` around lines 79 - 85, The cache-copy block that
calls copyFile and rename (using tmpPath and dest) must not abort the whole
download on transient FS errors; wrap the copyFile(tmpPath) and rename(tmpPath,
dest) calls in a try/catch inside the same size !== undefined branch, and on
error log a warning including the error and the paths (tmpPath, cached, dest),
attempt to clean up the tmpPath if present, then continue execution (do not
return) so the existing network fetch/retry logic runs; keep the original
early-return only on successful copy/rename.

Comment on lines +88 to +93
webkitVersion: WEBKIT_VERSION,
nodejsVersion: NODEJS_VERSION,
zigCommit: ZIG_COMMIT,
ci: true,
cacheDir: "",
} as Config;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the host’s actual default Zig commit here.

zig.ts switches to ZIG_COMMIT_PARALLEL on non-CI darwin hosts. Hard-coding ZIG_COMMIT means bun run prefetch on macOS caches the wrong compiler, so the next build still downloads Zig.

Suggested fix
-import { ZIG_COMMIT, zigDownloadUrl } from "./zig.ts";
+import { defaultZigCommit, zigDownloadUrl } from "./zig.ts";
 function urlConfig(t: PrefetchTarget, v: { debug: boolean; lto: boolean; asan: boolean; baseline: boolean }): Config {
   const host = detectHost();
+  const ci = process.env.CI === "true";
   const windows = t.os === "windows";
   return {
     os: t.os,
     arch: t.arch,
     abi: t.abi,
@@
     webkit: "prebuilt",
     webkitVersion: WEBKIT_VERSION,
     nodejsVersion: NODEJS_VERSION,
-    zigCommit: ZIG_COMMIT,
-    ci: true,
+    zigCommit: defaultZigCommit(ci, host.os),
+    ci,
     cacheDir: "",
   } as Config;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/prefetch.ts` around lines 88 - 93, The config hard-codes
zigCommit: ZIG_COMMIT which causes macOS (darwin) non-CI hosts to cache the
wrong Zig; change the assignment to pick the host-default Zig commit (use
ZIG_COMMIT_PARALLEL on non-CI darwin). Update the object construction to set
zigCommit = (process.platform === "darwin" && !ci) ? ZIG_COMMIT_PARALLEL :
ZIG_COMMIT (or call the existing helper that returns the default Zig commit),
referencing the ZIG_COMMIT and ZIG_COMMIT_PARALLEL symbols so local bun run
prefetch will cache the correct compiler.

Comment on lines +160 to +163
out: process.env[DEPS_CACHE_ENV] ?? join(process.cwd(), ".cache", "bun-deps"),
print: false,
target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() : undefined },
webkitVariants: ["lto", "none"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default --os=linux to a concrete ABI.

From a macOS/Windows host, --os=linux leaves args.target.abi as undefined even though the CLI says the default is gnu. That makes the Linux target ambiguous for any URL builder that keys off cfg.abi.

Suggested fix
       case "--os":
         if (val !== "linux" && val !== "darwin" && val !== "windows") throw new BuildError(`--os: ${val}`);
         args.target.os = val;
-        if (val !== "linux") args.target.abi = undefined;
+        if (val === "linux") {
+          args.target.abi ??= "gnu";
+        } else {
+          args.target.abi = undefined;
+        }
         break;

Also applies to: 178-182

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/prefetch.ts` around lines 160 - 163, The Linux target ABI can
end up undefined on non-Linux hosts; update the target construction (the object
assigned to args.target / target) so that when target.os === "linux" you use
detectLinuxAbi() but fall back to the concrete string "gnu" if detectLinuxAbi()
returns undefined (e.g., abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" :
undefined); make the same change in the other target creation site as well (the
other block around lines 178-182) so cfg.abi is never left ambiguous.

Comment on lines +21 to +25
// Unroutable URL — if the cache lookup is skipped, fetch hangs/errors and
// the test fails. The 0-byte sibling proves we ignore empty cache entries.
const url = "http://192.0.2.1/dep-v1.tar.gz";
writeFileSync(join(cache, sha256(url)), "PAYLOAD");
writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This test does not actually verify zero-byte cache-entry fallback.

Line 25 writes a zero-byte entry for a different URL than the one downloaded on Line 23, so the “ignore empty cache entries” claim is not exercised here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/build-prefetch/prefetch.test.ts` around lines 21 - 25, The
test currently writes a zero-byte cache entry for "http://192.0.2.1/empty"
instead of the URL under test, so the "ignore empty cache entries" behavior is
never exercised; update the calls to writeFileSync so the zero-byte file is
created for the same URL variable (url) used by the test (use
writeFileSync(join(cache, sha256(url)), "") ) and move the non-empty payload to
the sibling key (e.g., writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "PAYLOAD")) so sha256, cache, url and
writeFileSync are the correct references to locate the change.

});
const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);

expect(stderr).toBe("");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid strict empty-stderr assertions for spawned Bun subprocesses.

expect(stderr).toBe("") is brittle on ASAN targets where Bun subprocesses can emit sanitizer warnings despite successful behavior.

Suggested fix
-    expect(stderr).toBe("");
     expect(stdout).toContain("ok");
     expect(readFileSync(join(out, "got"), "utf8")).toBe("PAYLOAD");
     expect(exitCode).toBe(0);
-    expect(stderr).toBe("");
     expect(readFileSync(dest, "utf8")).toBe("FROM-NETWORK");
     expect(exitCode).toBe(0);
-    expect(stderr).toBe("");
     const lines = stdout.trim().split("\n");
Based on learnings: ASAN CI can emit warning text on stderr for spawned Bun subprocesses, so empty-stderr assertions are flaky.

Also applies to: 74-74, 97-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/build-prefetch/prefetch.test.ts` at line 42, The strict
assertion expect(stderr).toBe("") is brittle for spawned Bun subprocesses;
update the three occurrences that assert stderr (the expect(stderr).toBe("")
calls in prefetch.test.ts) to allow only whitespace or empty output instead of
an exact empty string — e.g., replace with a pattern/assertion like
expect(stderr).toMatch(/^\s*$/) (or an equivalent "no non-whitespace output"
check) for the stderr variable so ASAN sanitizer warnings (whitespace/newlines)
won't make the tests flaky.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 scripts/machine.mjs:1415-1429 — The temp directory created by mkdtempSync for bun-prefetch.tgz (line ~1423 of scripts/machine.mjs) is never deleted after use — neither the prefetchPath file nor its parent tmpDir are removed after upload, so they accumulate in the OS temp dir across repeated invocations. Additionally, in the Docker code path (features?.includes('docker')), the tarball is created unconditionally but never uploaded to the machine, making that temp dir and tar czf entirely wasted.

    Extended reasoning...

    Bug summary: When bootstrap is true in main(), a temp directory is created via mkdtempSync(join(tmpdir(), "prefetch-")), and a tarball bun-prefetch.tgz is written into it. The variable tmpDir is scoped to a one-off block {}, so after the block exits, only prefetchPath (pointing into tmpDir) remains accessible. After the upload, neither prefetchPath nor its parent directory are ever removed. The finally block at the end of main() only calls machine.close(), with no local temp file cleanup.

    Two concrete problems introduced by this PR:

    1. Non-Docker paths (Windows + Linux non-Docker): prefetchPath is uploaded via machine.upload() and then abandoned. The local tgz (typically a few MB of scripts/build/ TypeScript) and its parent tmp/prefetch-XXXXXX/ directory remain on disk indefinitely.
    2. Docker path (features?.includes("docker")): The tgz is created unconditionally in the block before the if (ci) / Docker branching, but in the Docker else if (dockerfilePath) branch (lines ~1553–1566), only bootstrapPath, dockerfilePath, and agentPath are uploaded — prefetchPath is never uploaded. The tar czf I/O and the resulting file are entirely wasted.

    Step-by-step proof:

    1. User runs machine.mjs ssh --feature=docker --cloud=aws ...
    2. bootstrap = true, so the block executes: tmpDir = mkdtempSync(".../prefetch-abc123"), prefetchPath = ".../prefetch-abc123/bun-prefetch.tgz", tarball written.
    3. features.includes("docker") is true, so dockerfilePath is set.
    4. Execution enters the Docker else if (dockerfilePath) branch, which uploads bootstrapPath, Dockerfile, and agentPath — no prefetchPath upload.
    5. main() exits (or errors); the finally block calls machine.close(). tmpDir and prefetchPath remain in /tmp (or %TEMP%).

    Why existing code doesn't prevent it: The pre-existing agentPath temp dir (created with mkdtempSync(join(tmpdir(), "agent-"))) has the same leak pattern — this PR follows that convention. However, the Docker path waste is new: prefetchPath is unconditionally created before any feature-checking, so the Docker path silently does wasted work.

    Fix: wrap the tmpDir lifecycle in a try/finally (or use an AsyncDisposable pattern) that calls fs.rmSync(tmpDir, { recursive: true, force: true }) after the upload succeeds, and also guard the tarball creation so it is skipped entirely when the Docker path is active (since Docker handles prefetch via the COPY scripts/build layer in the Dockerfile itself).

Comment on lines 406 to 432
}
}

function Prefetch-BuildDeps {
# machine.mjs ships scripts/build/ here. Absent → no-op (e.g. running
# bootstrap.ps1 standalone). Builds fall through to the network on cache
# miss, so a stale image after a dep bump still works.
$prefetchTgz = "C:\Windows\Temp\bun-prefetch.tgz"
if (-not (Test-Path $prefetchTgz)) {
return
}

$depsCache = "C:\bun-deps"
$prefetchDir = "C:\Windows\Temp\bun-prefetch"
New-Item -Path $prefetchDir -ItemType Directory -Force | Out-Null
& tar xzf $prefetchTgz -C $prefetchDir

$bun = Which bun -Required
& $bun "$prefetchDir\build\prefetch.ts" --out=$depsCache --webkit=lto,none,asan,baseline
if ($LASTEXITCODE -ne 0) { throw "prefetch.ts failed" }
Get-Content "$depsCache\manifest.json"

Set-Env "BUN_DEPS_CACHE_DIR" $depsCache
}

function Install-Rust {
if (Which rustc) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In Prefetch-BuildDeps, neither $prefetchTgz (C:\Windows\Temp\bun-prefetch.tgz) nor $prefetchDir (C:\Windows\Temp\bun-prefetch) is removed after prefetch.ts finishes, so both get baked into any Windows CI snapshot that skips Sysprep (e.g. AWS stop+createImage). Add Remove-Item -Recurse -Force $prefetchDir, $prefetchTgz at the end of the function to match how the Dockerfile and bootstrap.sh handle the same cleanup.

Extended reasoning...

What the bug is and how it manifests

Prefetch-BuildDeps in scripts/bootstrap.ps1 (lines 406-432) creates two temporary paths — $prefetchTgz = C:\Windows\Temp\bun-prefetch.tgz and $prefetchDir = C:\Windows\Temp\bun-prefetch — then never removes them. After prefetch.ts completes and the dep tarballs are written to C:\bun-deps, the extracted TypeScript build scripts (scripts/build/) and the original archive remain on disk permanently.

The specific code path that triggers it

machine.mjs uploads bun-prefetch.tgz to C:\Windows\Temp\bun-prefetch.tgz before invoking bootstrap.ps1. Inside the function, tar xzf extracts the archive to $prefetchDir, then bun prefetch.ts runs and writes dep archives to C:\bun-deps. The function ends with Set-Env "BUN_DEPS_CACHE_DIR" $depsCache and returns — no cleanup.

Why existing code does not prevent it

The Dockerfile handles this correctly with rm -rf /tmp/prefetch in the same RUN layer (so it is never committed to the image layer). bootstrap.sh uses create_tmp_directory (mktemp) for the prefetch dir, and then clean_system() at the end of main() runs rm -rf /tmp/* /var/tmp/*, catching it. bootstrap.ps1 has no equivalent global temp-cleanup step, and the prefetch dir uses a fixed well-known path rather than a temp path — so nothing else removes it.

Impact

For Azure/Packer image builds, Sysprep cleans C:\Windows\Temp, so the leftover files do not survive into the published gallery image. However, AWS snapshots work by stopping the instance and calling createImage with no Sysprep, meaning the files are preserved in the snapshot. The files themselves are small (TypeScript source from scripts/build/), not the 250 MB dep tarballs, so size inflation is modest — but it is a genuine, avoidable inconsistency with all other bootstrap paths.

How to fix it

Add a single line at the end of Prefetch-BuildDeps, after Set-Env "BUN_DEPS_CACHE_DIR" $depsCache:

Remove-Item -Recurse -Force $prefetchDir, $prefetchTgz

Step-by-step proof

  1. machine.mjs calls machine.upload(prefetchPath, "C:\Windows\Temp\bun-prefetch.tgz") — the archive lands at C:\Windows\Temp\bun-prefetch.tgz.
  2. Prefetch-BuildDeps runs tar xzf C:\Windows\Temp\bun-prefetch.tgz -C C:\Windows\Temp\bun-prefetch — C:\Windows\Temp\bun-prefetch\build*.ts now exists.
  3. bun prefetch.ts runs successfully; C:\bun-deps is populated.
  4. Set-Env "BUN_DEPS_CACHE_DIR" C:\bun-deps is called; function returns.
  5. No code removes C:\Windows\Temp\bun-prefetch or C:\Windows\Temp\bun-prefetch.tgz.
  6. machine.snapshot(label) calls aws.stopInstances + aws.createImage — the image is captured with both paths still on disk.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
test/integration/build-prefetch/prefetch.test.ts (2)

21-25: ⚠️ Potential issue | 🟡 Minor

This fixture doesn't cover empty-entry fallback.

The zero-byte file is created for a different URL, so this test only proves the happy-path cache hit. If you want to cover “ignore empty cache entries,” seed the empty file under sha256(url) and fetch that URL from a local server, or drop the claim from the comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/build-prefetch/prefetch.test.ts` around lines 21 - 25, The
comment claims we test "ignore empty cache entries" but the zero-byte file is
written for a different key; update the fixture so the empty sibling is written
at the same cache key as the tested URL or remove the misleading claim: either
replace writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "") with
writeFileSync(join(cache, sha256(url)), "") to seed an empty-entry fallback for
the same URL (and ensure the test fetches that URL from a local server), or keep
the existing PAYLOAD seed and change the comment to only claim a happy-path
cache hit; check usages of url, sha256, writeFileSync and cache when applying
the change.

42-42: ⚠️ Potential issue | 🟠 Major

Drop the exact empty-stderr assertions here.

Spawned Bun subprocesses can emit ASAN warnings on stderr even when the behavior is correct, so these checks make the suite flaky. The exit code already gives you the failure signal.

Suggested fix
-    expect(stderr).toBe("");
     expect(stdout).toContain("ok");
     expect(readFileSync(join(out, "got"), "utf8")).toBe("PAYLOAD");
     expect(exitCode).toBe(0);
@@
-    expect(stderr).toBe("");
     expect(readFileSync(dest, "utf8")).toBe("FROM-NETWORK");
     expect(exitCode).toBe(0);
@@
-    expect(stderr).toBe("");
     const lines = stdout.trim().split("\n");

Based on learnings: ASAN CI can emit warning text on stderr for spawned Bun subprocesses, so empty-stderr assertions are flaky.

Also applies to: 74-74, 97-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/build-prefetch/prefetch.test.ts` at line 42, The test
contains brittle exact-empty stderr assertions (expect(stderr).toBe("")); remove
those assertions (the exact-empty checks at the occurrences of
expect(stderr).toBe("");) so the tests no longer fail on ASAN warnings from
spawned Bun subprocesses; rely on the subprocess exit code/assertions already
present instead of asserting stderr is exactly empty.
scripts/build/download.ts (1)

79-84: ⚠️ Potential issue | 🟠 Major

Don't let cache-copy failures bypass the network fallback.

If statSync() finds a populated cache entry but copyFile() or rename() fails, the whole download aborts instead of falling through to the existing retry/network path. That turns transient AV/lock/permission races into hard build failures.

Suggested fix
     if (size !== undefined && size > 0) {
-      console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
       const tmpPath = `${dest}.${process.pid}.partial`;
-      await copyFile(cached, tmpPath);
-      await rename(tmpPath, dest);
-      return;
+      try {
+        console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
+        await copyFile(cached, tmpPath);
+        await rename(tmpPath, dest);
+        return;
+      } catch (err) {
+        await rm(tmpPath, { force: true }).catch(() => {});
+        console.warn(`warn: ignoring broken cache copy ${cached} -> ${dest}: ${err}`);
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/download.ts` around lines 79 - 84, The current cache-copy block
in scripts/build/download.ts returns immediately after copyFile/rename, which
lets copy/rename failures abort the whole download; wrap the copyFile(tmpPath)
and rename(tmpPath, dest) calls in a try/catch inside the if (size !== undefined
&& size > 0) branch (referencing tmpPath, cached, copyFile and rename) so that
on error you log a warning (including the error), attempt to remove any leftover
tmpPath, and then continue (do not return) so the existing network/retry path
runs; only return after a successful rename.
scripts/build/prefetch.ts (2)

65-93: ⚠️ Potential issue | 🟠 Major

Use the host's actual default Zig commit here.

urlConfig() always bakes zigCommit: ZIG_COMMIT with ci: true, so bun scripts/build/prefetch.ts on a non-CI darwin host prefetches the wrong compiler archive. The next build still downloads Zig because zig.ts chooses a different default there.

Suggested fix
-import { ZIG_COMMIT, zigDownloadUrl } from "./zig.ts";
+import { defaultZigCommit, zigDownloadUrl } from "./zig.ts";

 function urlConfig(t: PrefetchTarget, v: { debug: boolean; lto: boolean; asan: boolean; baseline: boolean }): Config {
   const host = detectHost();
+  const ci = process.env.CI === "true";
   const windows = t.os === "windows";
   return {
@@
-    zigCommit: ZIG_COMMIT,
-    ci: true,
+    zigCommit: defaultZigCommit(ci, host.os),
+    ci,
     cacheDir: "",
   } as Config;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/prefetch.ts` around lines 65 - 93, urlConfig currently
hardcodes ci: true and zigCommit: ZIG_COMMIT which causes non-CI hosts (e.g.,
local darwin) to prefetch the wrong Zig archive; change urlConfig to use the
detected host's CI flag and default Zig commit instead of always assuming CI:
set ci to host.ci and set zigCommit to host.ci ? ZIG_COMMIT : host.zigCommit (or
host.defaultZigCommit if that property exists on the object returned by
detectHost()), so local runs use the host-provided zig commit while CI continues
to use ZIG_COMMIT; update the urlConfig implementation accordingly.

160-163: ⚠️ Potential issue | 🟡 Minor

Default Linux targets to a concrete ABI.

From darwin/windows, --os=linux leaves args.target.abi undefined even though the help text says the default is gnu. Any URL builder that keys off cfg.abi now gets an ambiguous target.

Suggested fix
   const args: Args = {
     out: process.env[DEPS_CACHE_ENV] ?? join(process.cwd(), ".cache", "bun-deps"),
     print: false,
-    target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() : undefined },
+    target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" : undefined },
@@
       case "--os":
         if (val !== "linux" && val !== "darwin" && val !== "windows") throw new BuildError(`--os: ${val}`);
         args.target.os = val;
-        if (val !== "linux") args.target.abi = undefined;
+        if (val === "linux") {
+          args.target.abi ??= "gnu";
+        } else {
+          args.target.abi = undefined;
+        }
         break;

Also applies to: 178-181

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/prefetch.ts` around lines 160 - 163, The target ABI is computed
from host.os so when callers set target.os to "linux" from a different host the
abi remains undefined; update the target object creation (the "target: { os:
host.os, arch: host.arch, abi: host.os === \"linux\" ? detectLinuxAbi() :
undefined }" expression) to compute abi from the target's os value instead of
host.os — e.g., determine the assigned targetOs (the value you place in
target.os) and set abi to detectLinuxAbi() when that targetOs === "linux"
(falling back to the concrete default like "gnu" if detectLinuxAbi() cannot
determine one); apply the same change to the other occurrence around lines
178-181.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build/download.ts`:
- Around line 104-106: The current logic marks all 4xx responses as permanent by
setting "permanent = res.status >= 400 && res.status < 500", which incorrectly
treats transient 429 and 408 as non-retriable; update the condition in the
download retry logic (the variable named "permanent" in
scripts/build/download.ts that checks res.status) so that 4xx responses are
considered permanent except for 408 and 429 (i.e., treat 408 and 429 as
transient/retriable), keeping the rest of the 5xx/network retry behavior
unchanged.

In `@scripts/machine.mjs`:
- Around line 1239-1240: The packer var "prefetch_tarball" is being added
unconditionally even when prefetchPath is undefined; update the code that builds
the args array (used by buildWindowsImageWithPacker()) to only include the
"-var", `prefetch_tarball=${prefetchPath}` pair when prefetchPath is truthy
(i.e., set inside the bootstrap block). Locate usages of prefetchPath and the
args array construction and guard the push/concatenation so Packer isn't passed
prefetch_tarball=undefined.

In `@scripts/packer/windows-arm64.pkr.hcl`:
- Around line 86-90: The file provisioner unconditionally uploads
var.prefetch_tarball which defaults to "" and causes failures when unset; modify
the provisioner block that sets source = var.prefetch_tarball (the provisioner
"file" that writes to "C:\\Windows\\Temp\\bun-prefetch.tgz") so it only runs
when var.prefetch_tarball is non-empty (e.g. guard the block with a conditional
or use a count/conditional expression to include the provisioner only if
var.prefetch_tarball != ""); keep scripts/bootstrap.ps1 behavior unchanged so
bootstrap can handle the missing tarball.

In `@scripts/packer/windows-x64.pkr.hcl`:
- Around line 87-91: The file provisioner currently always uploads
var.prefetch_tarball causing failures when it's empty; make the upload
conditional by adding a count meta-argument to the provisioner so it only runs
when var.prefetch_tarball is non-empty (e.g. add count = var.prefetch_tarball !=
"" ? 1 : 0 inside the provisioner "file" block that references
var.prefetch_tarball and destination "C:\\Windows\\Temp\\bun-prefetch.tgz"),
leaving scripts/bootstrap.ps1's existing absent-file handling unchanged.

---

Duplicate comments:
In `@scripts/build/download.ts`:
- Around line 79-84: The current cache-copy block in scripts/build/download.ts
returns immediately after copyFile/rename, which lets copy/rename failures abort
the whole download; wrap the copyFile(tmpPath) and rename(tmpPath, dest) calls
in a try/catch inside the if (size !== undefined && size > 0) branch
(referencing tmpPath, cached, copyFile and rename) so that on error you log a
warning (including the error), attempt to remove any leftover tmpPath, and then
continue (do not return) so the existing network/retry path runs; only return
after a successful rename.

In `@scripts/build/prefetch.ts`:
- Around line 65-93: urlConfig currently hardcodes ci: true and zigCommit:
ZIG_COMMIT which causes non-CI hosts (e.g., local darwin) to prefetch the wrong
Zig archive; change urlConfig to use the detected host's CI flag and default Zig
commit instead of always assuming CI: set ci to host.ci and set zigCommit to
host.ci ? ZIG_COMMIT : host.zigCommit (or host.defaultZigCommit if that property
exists on the object returned by detectHost()), so local runs use the
host-provided zig commit while CI continues to use ZIG_COMMIT; update the
urlConfig implementation accordingly.
- Around line 160-163: The target ABI is computed from host.os so when callers
set target.os to "linux" from a different host the abi remains undefined; update
the target object creation (the "target: { os: host.os, arch: host.arch, abi:
host.os === \"linux\" ? detectLinuxAbi() : undefined }" expression) to compute
abi from the target's os value instead of host.os — e.g., determine the assigned
targetOs (the value you place in target.os) and set abi to detectLinuxAbi() when
that targetOs === "linux" (falling back to the concrete default like "gnu" if
detectLinuxAbi() cannot determine one); apply the same change to the other
occurrence around lines 178-181.

In `@test/integration/build-prefetch/prefetch.test.ts`:
- Around line 21-25: The comment claims we test "ignore empty cache entries" but
the zero-byte file is written for a different key; update the fixture so the
empty sibling is written at the same cache key as the tested URL or remove the
misleading claim: either replace writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "") with writeFileSync(join(cache,
sha256(url)), "") to seed an empty-entry fallback for the same URL (and ensure
the test fetches that URL from a local server), or keep the existing PAYLOAD
seed and change the comment to only claim a happy-path cache hit; check usages
of url, sha256, writeFileSync and cache when applying the change.
- Line 42: The test contains brittle exact-empty stderr assertions
(expect(stderr).toBe("")); remove those assertions (the exact-empty checks at
the occurrences of expect(stderr).toBe("");) so the tests no longer fail on ASAN
warnings from spawned Bun subprocesses; rely on the subprocess exit
code/assertions already present instead of asserting stderr is exactly empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b49e530f-4f04-4b4d-9076-ea1242838b5e

📥 Commits

Reviewing files that changed from the base of the PR and between b752ec8 and ce53dd9.

📒 Files selected for processing (8)
  • .buildkite/Dockerfile-bootstrap.sh
  • scripts/build/download.ts
  • scripts/build/prefetch.ts
  • scripts/machine.mjs
  • scripts/packer/variables.pkr.hcl
  • scripts/packer/windows-arm64.pkr.hcl
  • scripts/packer/windows-x64.pkr.hcl
  • test/integration/build-prefetch/prefetch.test.ts

Comment on lines +104 to +106
// 4xx is permanent (wrong URL / asset not published) — retrying just
// burns the backoff budget. Only 5xx/network errors get another shot.
permanent = res.status >= 400 && res.status < 500;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep 429/408 on the retry path.

This now classifies every 4xx as permanent. GitHub and nodejs.org can return transient 429 Too Many Requests or 408 Request Timeout, so zig/node-header/WebKit downloads will fail on the first blip instead of backing off. Only clearly permanent misses should short-circuit.

Suggested fix
-        permanent = res.status >= 400 && res.status < 500;
+        permanent = res.status >= 400 && res.status < 500 && res.status !== 408 && res.status !== 429;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/download.ts` around lines 104 - 106, The current logic marks
all 4xx responses as permanent by setting "permanent = res.status >= 400 &&
res.status < 500", which incorrectly treats transient 429 and 408 as
non-retriable; update the condition in the download retry logic (the variable
named "permanent" in scripts/build/download.ts that checks res.status) so that
4xx responses are considered permanent except for 408 and 429 (i.e., treat 408
and 429 as transient/retriable), keeping the rest of the 5xx/network retry
behavior unchanged.

Comment on lines +1239 to +1240
"-var",
`prefetch_tarball=${prefetchPath}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the area around lines 1239-1240 in scripts/machine.mjs
# to understand the full context
wc -l scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 79


🏁 Script executed:

# Get the context around lines 1239-1240
sed -n '1220,1260p' scripts/machine.mjs | cat -n

Repository: oven-sh/bun

Length of output: 1316


🏁 Script executed:

# Find where prefetchPath is assigned
rg 'prefetchPath\s*=' scripts/machine.mjs -n

Repository: oven-sh/bun

Length of output: 114


🏁 Script executed:

# Look for the bootstrap block and bootstrap flag handling
rg 'bootstrap|prefetchPath' scripts/machine.mjs -n -B 2 -A 2

Repository: oven-sh/bun

Length of output: 5766


🏁 Script executed:

# Find where buildWindowsImageWithPacker is called and under what conditions
rg 'buildWindowsImageWithPacker' scripts/machine.mjs -n -B 5 -A 5

Repository: oven-sh/bun

Length of output: 1306


🏁 Script executed:

# Look at the Packer template to see how prefetch_tarball is used
find . -name "*.pkr.hcl" -o -name "*.json" | grep -i packer | head -20

Repository: oven-sh/bun

Length of output: 165


🏁 Script executed:

# Search for Packer template files
fd -e hcl -e json . -path "*packer*"

Repository: oven-sh/bun

Length of output: 223


🏁 Script executed:

# Look at the Packer directory structure
ls -la scripts/machine/packer/ 2>/dev/null || ls -la scripts/packer/ 2>/dev/null || find scripts -type d -name packer

Repository: oven-sh/bun

Length of output: 373


🏁 Script executed:

# Check the variables definition to see if prefetch_tarball is required
cat scripts/packer/variables.pkr.hcl

Repository: oven-sh/bun

Length of output: 1677


🏁 Script executed:

# Check the windows-x64 template to see how prefetch_tarball is used
cat scripts/packer/windows-x64.pkr.hcl

Repository: oven-sh/bun

Length of output: 6018


🏁 Script executed:

# Check if there's any conditional logic based on prefetch_tarball
rg 'prefetch_tarball' scripts/packer/

Repository: oven-sh/bun

Length of output: 267


🏁 Script executed:

# Let's verify what happens when we pass undefined as a string to Packer
# Check if there's any recent Packer execution logs or error handling
rg 'prefetch|undefined' scripts/machine.mjs -A 3 -B 3 | head -60

Repository: oven-sh/bun

Length of output: 1748


🏁 Script executed:

# Verify the exact flow: when bootstrap=false, is prefetchPath ever assigned?
sed -n '1380,1460p' scripts/machine.mjs | cat -n

Repository: oven-sh/bun

Length of output: 3507


Only pass prefetch_tarball when a tarball was actually generated.

prefetchPath is assigned only inside the bootstrap block, but Azure Windows image builds still call buildWindowsImageWithPacker() when --no-bootstrap is used. In that flow, passing -var prefetch_tarball=undefined causes Packer's file provisioner to fail when trying to upload a non-existent source file.

Suggested fix
     "-var",
     `agent_script=${agentPath}`,
-    "-var",
-    `prefetch_tarball=${prefetchPath}`,
+    ...(prefetchPath ? ["-var", `prefetch_tarball=${prefetchPath}`] : []),
     templateDir,
   ];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"-var",
`prefetch_tarball=${prefetchPath}`,
"-var",
`agent_script=${agentPath}`,
...(prefetchPath ? ["-var", `prefetch_tarball=${prefetchPath}`] : []),
templateDir,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/machine.mjs` around lines 1239 - 1240, The packer var
"prefetch_tarball" is being added unconditionally even when prefetchPath is
undefined; update the code that builds the args array (used by
buildWindowsImageWithPacker()) to only include the "-var",
`prefetch_tarball=${prefetchPath}` pair when prefetchPath is truthy (i.e., set
inside the bootstrap block). Locate usages of prefetchPath and the args array
construction and guard the push/concatenation so Packer isn't passed
prefetch_tarball=undefined.

Comment on lines +86 to +90
// Step 0: Upload scripts/build/ so bootstrap's Prefetch-BuildDeps can run.
provisioner "file" {
source = var.prefetch_tarball
destination = "C:\\Windows\\Temp\\bun-prefetch.tgz"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "variables.pkr.hcl" -type f

Repository: oven-sh/bun

Length of output: 89


🏁 Script executed:

find . -name "bootstrap.ps1" -type f

Repository: oven-sh/bun

Length of output: 78


🏁 Script executed:

find . -name "windows-*.pkr.hcl" -type f

Repository: oven-sh/bun

Length of output: 130


🏁 Script executed:

cat ./scripts/packer/variables.pkr.hcl

Repository: oven-sh/bun

Length of output: 1677


🏁 Script executed:

cat ./scripts/bootstrap.ps1

Repository: oven-sh/bun

Length of output: 22951


🏁 Script executed:

sed -n '80,95p' ./scripts/packer/windows-x64.pkr.hcl

Repository: oven-sh/bun

Length of output: 479


Guard this upload when no prefetch tarball was provided.

prefetch_tarball defaults to "" in scripts/packer/variables.pkr.hcl, and scripts/bootstrap.ps1 treats the tarball as optional. This unconditional file provisioner fails before bootstrap can take that fallback path when the variable is not explicitly set, breaking default or standalone Packer invocations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/packer/windows-arm64.pkr.hcl` around lines 86 - 90, The file
provisioner unconditionally uploads var.prefetch_tarball which defaults to ""
and causes failures when unset; modify the provisioner block that sets source =
var.prefetch_tarball (the provisioner "file" that writes to
"C:\\Windows\\Temp\\bun-prefetch.tgz") so it only runs when var.prefetch_tarball
is non-empty (e.g. guard the block with a conditional or use a count/conditional
expression to include the provisioner only if var.prefetch_tarball != ""); keep
scripts/bootstrap.ps1 behavior unchanged so bootstrap can handle the missing
tarball.

Comment on lines +87 to +91
// Step 0: Upload scripts/build/ so bootstrap's Prefetch-BuildDeps can run.
provisioner "file" {
source = var.prefetch_tarball
destination = "C:\\Windows\\Temp\\bun-prefetch.tgz"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "variables.pkr.hcl" | head -20

Repository: oven-sh/bun

Length of output: 87


🏁 Script executed:

fd -t f "bootstrap.ps1" | head -20

Repository: oven-sh/bun

Length of output: 76


🏁 Script executed:

fd -t f "windows-x64.pkr.hcl" | head -20

Repository: oven-sh/bun

Length of output: 89


🏁 Script executed:

cat -n scripts/packer/variables.pkr.hcl | grep -A 5 -B 5 "prefetch_tarball"

Repository: oven-sh/bun

Length of output: 479


🏁 Script executed:

cat -n scripts/bootstrap.ps1 | grep -A 10 -B 5 "prefetch" | head -40

Repository: oven-sh/bun

Length of output: 1105


🏁 Script executed:

sed -n '80,100p' scripts/packer/windows-x64.pkr.hcl

Repository: oven-sh/bun

Length of output: 582


Guard this upload when no prefetch tarball was provided.

prefetch_tarball defaults to "" in scripts/packer/variables.pkr.hcl, and scripts/bootstrap.ps1 explicitly handles its absence (lines 413–416 return early if the file doesn't exist, with comment "Absent → no-op"). The unconditional file provisioner makes direct or legacy Packer invocations fail before bootstrap can take that fallback path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/packer/windows-x64.pkr.hcl` around lines 87 - 91, The file
provisioner currently always uploads var.prefetch_tarball causing failures when
it's empty; make the upload conditional by adding a count meta-argument to the
provisioner so it only runs when var.prefetch_tarball is non-empty (e.g. add
count = var.prefetch_tarball != "" ? 1 : 0 inside the provisioner "file" block
that references var.prefetch_tarball and destination
"C:\\Windows\\Temp\\bun-prefetch.tgz"), leaving scripts/bootstrap.ps1's existing
absent-file handling unchanged.

Comment on lines +79 to +85
if (size !== undefined && size > 0) {
console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
const tmpPath = `${dest}.${process.pid}.partial`;
await copyFile(cached, tmpPath);
await rename(tmpPath, dest);
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In the cache-hit path of downloadWithRetry() (lines 79–85 of scripts/build/download.ts), copyFile(cached, tmpPath) and rename(tmpPath, dest) are called with no error handling, so any transient FS error (disk full, AV/indexer lock on Windows, NFS stale handle) propagates as an unhandled exception that aborts the build entirely. The network retry path immediately below has a proper try/catch with partial-file cleanup for exactly this reason; the same protection is missing from the cache-hit path. Fix: wrap copyFile+rename in a try/catch, clean up the partial file on failure, and fall through to the network path.

Extended reasoning...

What the bug is and how it manifests

In scripts/build/download.ts lines 79-85, when BUN_DEPS_CACHE_DIR is set and a matching cache entry is found (size > 0), the function runs copyFile(cached, tmpPath) and rename(tmpPath, dest) with no try/catch. If either throws (disk full ENOSPC, transient permission race EACCES, AV/indexer file lock on Windows, NFS stale handle, or another process deleting the cached file between statSync and copyFile), the exception propagates out of downloadWithRetry as an unhandled rejection, aborting the build step entirely.

The specific code path that triggers it

statSync is already guarded by a try/catch (lines 73-74). But the guard stops there. If statSync succeeds (returns a non-zero size), execution enters the if (size !== undefined && size > 0) block where copyFile and rename run unprotected. A successful stat is no guarantee the copy will succeed: file locks, cross-device renames to a full temp partition, or a concurrent deletion can all cause these operations to fail after stat passes.

Why existing code does not prevent it

The network retry loop (lines 96-120) has an explicit try/catch that catches all errors from fetch, pipeline, and rename, swallows cleanup errors with .catch(() => {}), and loops to the next attempt. There is even a comment acknowledging the Windows AV/indexer problem: 'on Windows, AV/indexer can briefly lock the partial'. The cache-hit path above shares the same failure modes (the cached file can also be locked) but has no equivalent protection.

Impact

This is a regression introduced by this PR. Before this PR, all downloads went through the network retry path, which tolerates transient FS errors. Now, when BUN_DEPS_CACHE_DIR is set (the primary use case for this PR - it is baked into CI images), a transient FS error during the copy turns a recoverable condition into a hard build failure. On Windows, AV/indexer locking temp files is common enough that the existing code already has a comment about it. This will cause sporadic CI failures on Windows and on any platform with transient NFS or disk-space issues.

How to fix it

Wrap the copyFile+rename block in a try/catch, clean up the partial file on failure, and fall through to the network path:

if (size \!== undefined && size > 0) {
  const tmpPath = dest + '.' + process.pid + '.partial';
  try {
    console.log('from BUN_DEPS_CACHE_DIR ...');
    await copyFile(cached, tmpPath);
    await rename(tmpPath, dest);
    return;
  } catch {
    await rm(tmpPath, { force: true }).catch(() => {});
    // Fall through to network path.
  }
}

Step-by-step proof

  1. CI image is built with BUN_DEPS_CACHE_DIR=/opt/bun-deps set and 50 cached archives.
  2. A build starts; downloadWithRetry is called for boringssl.tar.gz.
  3. statSync(cached) succeeds, returns size = 52428800 (> 0).
  4. copyFile(cached, tmpPath) is called. An AV scanner has a brief lock on the cached file and throws EACCES.
  5. The EACCES exception propagates up through downloadWithRetry and crashes the build with an unhandled error.
  6. The network retry loop at line 96 - which would have downloaded the file in seconds - is never reached.

🔬 also observed by coderabbitai[bot]

Comment on lines +88 to +93
webkitVersion: WEBKIT_VERSION,
nodejsVersion: NODEJS_VERSION,
zigCommit: ZIG_COMMIT,
ci: true,
cacheDir: "",
} as Config;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In urlConfig(), zigCommit is hardcoded to ZIG_COMMIT and ci is hardcoded to true, but zig.ts's defaultZigCommit(false, 'darwin') returns ZIG_COMMIT_PARALLEL for local macOS builds. Since the zig download URL embeds the commit hash and the cache key is sha256(url), the URL cached by bun run prefetch on macOS differs from the URL the actual build requests — resulting in a guaranteed cache miss for the Zig compiler on every local macOS run. Fix by setting ci = process.env.CI === "true" and using defaultZigCommit(ci, host.os) instead of the hardcoded constant.

Extended reasoning...

What the bug is and how it manifests

urlConfig() in prefetch.ts (lines 88–93) hardcodes two values: zigCommit: ZIG_COMMIT and ci: true. These are then passed to zigDownloadUrl(cfg, safe) in zig.ts, which builds the URL as https://github.com/oven-sh/zig/releases/download/autobuild-${cfg.zigCommit}/bootstrap-<arch>-<os>-<abi>.zip. The offline cache key written by prefetch is sha256(url) (via offlineCacheKey()). At build time, downloadWithRetry() recomputes sha256(url) from the URL the build system resolves, and looks it up in BUN_DEPS_CACHE_DIR. If those two URLs differ, the cache entry is never found.

The specific code path that triggers it

zig.ts exports ZIG_COMMIT = '365343af...' and ZIG_COMMIT_PARALLEL = '445fc0cb...'. The function defaultZigCommit(ci, hostOs) returns ZIG_COMMIT when ci=true (or when hostOs \!== 'darwin'), and ZIG_COMMIT_PARALLEL when ci=false && hostOs === 'darwin'. When a macOS developer runs bun run prefetch locally (no CI env var), urlConfig() still forces ci: true, producing a URL with ZIG_COMMIT. But the actual build calls defaultZigCommit(false, 'darwin') (because CI is not set) and gets ZIG_COMMIT_PARALLEL, producing a URL with a different commit hash.

Why existing code doesn't prevent it

There is no guard that ensures urlConfig() uses the same commit selection logic as the build system. The hardcoding was presumably intended for CI image baking (where ci=true is always correct), but the function is also reached by bun run prefetch invoked from macOS developer machines, which the USAGE string explicitly documents as a supported workflow.

Proof via concrete example

  1. Developer on macOS (no CI env) runs bun run prefetch.
  2. urlConfig() returns zigCommit: '365343af' (ZIG_COMMIT) because ci: true is hardcoded.
  3. zigDownloadUrl builds: .../autobuild-365343af.../bootstrap-x86_64-macos-none.zip
  4. Cache key written: sha256('.../autobuild-365343af.../...') → file stored.
  5. Developer runs the build. The build calls defaultZigCommit(false, 'darwin') → returns '445fc0cb' (ZIG_COMMIT_PARALLEL).
  6. Build calls zigDownloadUrl with zigCommit: '445fc0cb' → URL is .../autobuild-445fc0cb.../bootstrap-x86_64-macos-none.zip.
  7. Cache lookup: sha256('.../autobuild-445fc0cb.../...') → no file → cache miss → Zig downloads from network.

Impact

For CI (the primary use case for image baking), there is no bug: both prefetch and the build use ci=true, so both select ZIG_COMMIT. For local macOS developers — the explicit secondary audience documented in the USAGE string — prefetch for the Zig compiler is entirely useless. The ~200MB compiler is re-downloaded on every build regardless of whether prefetch ran.

Fix

Replace the two hardcoded lines with:

const ci = process.env.CI === 'true';
// ...
zigCommit: defaultZigCommit(ci, host.os),
ci,

And update the import from zig.ts to include defaultZigCommit.

🔬 also observed by coderabbitai

Comment on lines +88 to +91
provisioner "file" {
source = var.prefetch_tarball
destination = "C:\\Windows\\Temp\\bun-prefetch.tgz"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The Packer file provisioner with source = var.prefetch_tarball will fail immediately when Packer is invoked without -var prefetch_tarball=..., because Packer statically validates source paths and has no HCL-level conditional to skip a provisioner when a variable is empty. The empty default creates a false impression of optionality: bootstrap.ps1 guards against a missing file, but Packer fails before any provisioner runs. The same issue affects both windows-x64.pkr.hcl and windows-arm64.pkr.hcl (both line 88). In practice machine.mjs always supplies the value, but any direct packer build invocation (debugging, manual re-runs, future automation) fails immediately.

Extended reasoning...

What the bug is

Both windows-x64.pkr.hcl and windows-arm64.pkr.hcl add a provisioner "file" with source = var.prefetch_tarball, where var.prefetch_tarball has default = "". Packer validates the source path before any provisioner executes: stat("") fails, aborting the build before WinRM even connects to the VM.

The specific code path

variables.pkr.hcl defines variable "prefetch_tarball" { default = "" }. In windows-x64.pkr.hcl lines 88-91 (mirrored in windows-arm64.pkr.hcl):

provisioner "file" {
  source      = var.prefetch_tarball   // resolves to "" by default
  destination = "C:\Windows\Temp\bun-prefetch.tgz"
}

Packer evaluates all source paths at plan time. An empty string produces a fatal stat error and the build aborts.

Why existing code does not prevent it

bootstrap.ps1 contains if (-not (Test-Path prefetchTgz)) { return }, which gracefully handles the file being absent at runtime. However, Packer fails before any provisioner runs — the PowerShell guard is never reached. There is no HCL-level mechanism to skip a file provisioner based on an empty variable.

Addressing the refutation

The refutation correctly observes that this mirrors the pre-existing agent_script antipattern (same empty default, same file provisioner, no skip logic), and that machine.mjs always supplies the value for the normal CI path. Both points are accurate. However: (1) the presence of a prior bug does not justify a new instance; (2) the variable description says the tarball is optional and bootstrap handles its absence — that contract is honoured at the PowerShell layer but broken at the Packer layer, misleading anyone invoking Packer directly; and (3) prefetchPath is only set inside the if (bootstrap) block in machine.mjs, so a --no-bootstrap Azure invocation would pass undefined to buildWindowsImageWithPacker, causing Packer to receive the string "undefined" or an absent -var, either of which makes the problem worse.

Impact

Any packer build invocation without -var prefetch_tarball=... — direct debugging, manual re-runs, future automation not going through machine.mjs — fails immediately at source-path validation. The developer cannot reproduce or debug the image without also constructing and passing a local tarball path.

Step-by-step proof

  1. Developer runs: packer build -only azure-arm.windows-x64 scripts/packer
  2. No -var prefetch_tarball=... is supplied; var.prefetch_tarball resolves to "".
  3. Packer validates all provisioner source paths before connecting to the VM. stat("") fails.
  4. Build aborts. WinRM never connects. Bootstrap never runs.
  5. Developer cannot test the image locally without knowing they must also supply a path to a hand-crafted tarball.

Fix

Use a dynamic provisioner block conditioned on var.prefetch_tarball != "", or make the variable required by removing its default so the optionality claim is honest. The same fix is needed for the pre-existing agent_script provisioner.

Comment on lines 102 to +110
if (!res.ok || res.body === null) {
lastError = new BuildError(`HTTP ${res.status} ${res.statusText} for ${url}`);
continue;
// 4xx is permanent (wrong URL / asset not published) — retrying just
// burns the backoff budget. Only 5xx/network errors get another shot.
permanent = res.status >= 400 && res.status < 500;
} else {
// Cast: DOM ReadableStream vs node:stream/web ReadableStream — same
// shape at runtime, different TS lib declarations.
await pipeline(Readable.fromWeb(res.body as unknown as NodeWebReadable), createWriteStream(tmpPath));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new permanent shortcut at line 106 marks any 4xx response as non-retryable (res.status >= 400 && res.status < 500), which includes HTTP 429 Too Many Requests — a transient rate-limiting signal that should be retried after backoff. A single 429 from GitHub CDN now immediately breaks the retry loop and fails the build, where before this PR it would have consumed all five retry attempts as intended. Fix: permanent = res.status >= 400 && res.status < 500 && res.status !== 429.

Extended reasoning...

What the bug is and how it manifests

downloadWithRetry() now sets permanent = res.status >= 400 && res.status < 500 (lines 106–107) immediately after receiving any 4xx response, then breaks out of the retry loop at line 121. HTTP 429 Too Many Requests sits squarely in that range. A 429 response from GitHub CDN — which occurs during traffic bursts when downloading large artifacts like the ~200 MB WebKit prebuilt or Zig compiler — now terminates the download immediately with zero retries, producing a hard build failure.

The specific code path that triggers it

  1. downloadWithRetry(url, dest, logPrefix) enters the for (let attempt = 1; attempt <= 5; attempt++) loop.
  2. fetch() returns a Response with status === 429 and ok === false.
  3. The if (!res.ok) branch fires: lastError is set and permanent is assigned res.status >= 400 && res.status < 500true.
  4. The catch block is skipped (no exception was thrown — fetch itself succeeded).
  5. Immediately after the try/catch, if (permanent) break; fires — the loop exits after the very first attempt.
  6. throw new BuildError("Failed to download after 5 attempts…") is reached after only one attempt.

Why existing code does not prevent it

Before this PR, the non-OK branch used continue unconditionally — always retrying. The PR intent was correct: genuine 4xx errors like 404 (asset not published) or 410 (gone) are permanent and burning four extra backoff slots is wasteful. The code comment even says "wrong URL / asset not published" — that characterization does not apply to 429, which is entirely about server-side rate limiting and is explicitly designed to be retried.

What the impact would be

This PR downloads large blobs (WebKit ~200 MiB, Zig ~80 MiB) concurrently across multiple CI workers all pointed at the same GitHub release endpoint. GitHub CDN returns 429 during traffic bursts. With this regression, any single 429 on any file aborts that dep download permanently in that build run, turning a previously-recoverable transient condition into a mandatory build failure. CI has no manual retry for individual dep fetches; the whole job must be re-queued.

How to fix it

Exclude 429 from the permanent range:

permanent = res.status >= 400 && res.status < 500 && res.status !== 429;

Optionally, also honour the Retry-After response header for the backoff delay, but the minimal fix is just the exclusion.

Step-by-step proof

  1. downloadWithRetry("https://github.com/oven-sh/WebKit/releases/download/.../bun-webkit-linux-amd64-lto.tar.gz", dest, "WebKit") is called.
  2. On attempt 1, GitHub CDN responds HTTP 429 Too Many Requests (ok === false).
  3. lastError = new BuildError("HTTP 429 Too Many Requests for …").
  4. permanent = 429 >= 400 && 429 < 500true.
  5. if (permanent) break; — loop exits immediately. Attempts 2–5 never run.
  6. Function throws BuildError: Failed to download after 5 attempts after a single network call and zero backoff delay.
  7. Build fails. Before this PR, step 4 would have left permanent = false (or the unconditional continue path ran), attempt 2 would fire after 2 s backoff, and the transient rate limit would have cleared.

Comment on lines +23 to +25
const url = "http://192.0.2.1/dep-v1.tar.gz";
writeFileSync(join(cache, sha256(url)), "PAYLOAD");
writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The 'downloadWithRetry copies from cache on hit' test writes an empty cache entry for sha256("http://192.0.2.1/empty"), which is a completely different URL than the one actually downloaded (url = "http://192.0.2.1/dep-v1.tar.gz"), so the zero-byte-skip behavior is never exercised. The comment 'The 0-byte sibling proves we ignore empty cache entries' is incorrect; removing the size > 0 check from the production code would not cause this test to fail.

Extended reasoning...

What the bug is and how it manifests

In the test 'downloadWithRetry copies from cache on hit' (lines 21–44 of prefetch.test.ts), line 24 writes "PAYLOAD" for sha256(url) where url = "http://192.0.2.1/dep-v1.tar.gz", and line 25 writes an empty file for sha256("http://192.0.2.1/empty") — a completely different URL. The test then only calls downloadWithRetry(url, ...), which always finds the non-empty PAYLOAD entry. The zero-byte entry for the other URL is never looked up.

The specific code path that triggers it

downloadWithRetry computes cached = join(cacheDir, offlineCacheKey(url)) and calls statSync(cached).size. Since url maps to the PAYLOAD entry (size > 0), the cache-hit path succeeds immediately. The zero-byte entry for "http://192.0.2.1/empty" is never consulted during this test. The comment on line 25 — "The 0-byte sibling proves we ignore empty cache entries" — is factually wrong.

Why existing code doesn't prevent it

The test was presumably written with the intent of having a zero-byte entry for url itself, then relying on the size > 0 guard in download.ts (lines 79–85) to fall through to the network. But the two writeFileSync calls are swapped: the non-empty PAYLOAD is written for sha256(url) and the empty one is written for a different URL. Nothing in the test framework detects that the zero-byte fallthrough path is never reached.

What the impact would be

The production size > 0 check (which correctly refuses to feed a 0-byte file to tar) is not verified by this test suite. If someone removed or regressed that guard, all four test-suite passes (three verifiers + synthesis) would still succeed because the URL being tested always resolves to the non-empty PAYLOAD entry.

How to fix it

To actually test the stated behavior, swap the two entries: write the empty file for sha256(url) (so downloadWithRetry sees size === 0 and falls through to the network), and write the PAYLOAD for sha256("http://192.0.2.1/empty") or simply remove it. With size === 0 for url, the function should attempt a network fetch to the unroutable 192.0.2.1 address and fail — proving the fallthrough occurred. Alternatively, use a locally-controlled test server so the network fallthrough produces a verifiable response rather than a timeout.

Step-by-step proof

  1. url = "http://192.0.2.1/dep-v1.tar.gz"
  2. Line 24: writeFileSync(join(cache, sha256(url)), "PAYLOAD") → cache entry for url has content.
  3. Line 25: writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "") → cache entry for a URL that is never requested has zero bytes.
  4. downloadWithRetry(url, ...) is called.
  5. cached = join(cacheDir, sha256(url)) → resolves to the PAYLOAD file.
  6. statSync(cached).size === 7 ("PAYLOAD" is 7 bytes) → size > 0 is true → file is copied, function returns.
  7. The zero-byte entry is never touched. Removing the size > 0 check does not change the outcome for this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants