[build images] prefetch build deps into CI images#29209
[build images] prefetch build deps into CI images#29209Jarred-Sumner wants to merge 4 commits intomainfrom
Conversation
Add scripts/build/prefetch.ts which downloads every archive the build fetches (dep tarballs, prebuilt WebKit, zig, node-headers) into a content-addressed dir keyed by sha256(url). downloadWithRetry() now checks $BUN_DEPS_CACHE_DIR before the network and falls through on a miss, so a stale image after a dep bump still builds — that one dep just downloads normally. Wired into all three image-build paths: - machine.mjs ships scripts/build/ as a tarball alongside bootstrap - bootstrap.sh / bootstrap.ps1 run prefetch into /opt/bun-deps (or C:\bun-deps) and export BUN_DEPS_CACHE_DIR via the buildkite environment hook (linux/mac) / Machine env (windows) - .buildkite/Dockerfile gets an equivalent layer Bootstrap versions bumped (sh 29->30, ps1 16->17).
|
Updated 12:23 AM PT - Apr 12th, 2026
❌ @Jarred-Sumner, your commit ce53dd9 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 29209That installs a local version of the PR into your bun-29209 --bun |
- Packer (Windows Azure): upload bun-prefetch.tgz via a file provisioner before bootstrap so Prefetch-BuildDeps actually runs. - Dockerfile-bootstrap.sh: extract the tarball into the fakebun build context so the Dockerfile COPY scripts/build layer has something to copy (was a hard failure).
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
Not every -asan/-baseline tarball is published for every target (Windows has no -asan). Prefetch now treats download failures as warnings instead of aborting bootstrap, and downloadWithRetry no longer retries 4xx responses.
WalkthroughAdds a build-dependency prefetch system: a new prefetch script and offline cache, changes to download logic to serve cached artifacts, bootstrap and CI hooks to install/upload a Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/build/download.ts`:
- Around line 79-85: The cache-copy block that calls copyFile and rename (using
tmpPath and dest) must not abort the whole download on transient FS errors; wrap
the copyFile(tmpPath) and rename(tmpPath, dest) calls in a try/catch inside the
same size !== undefined branch, and on error log a warning including the error
and the paths (tmpPath, cached, dest), attempt to clean up the tmpPath if
present, then continue execution (do not return) so the existing network
fetch/retry logic runs; keep the original early-return only on successful
copy/rename.
In `@scripts/build/prefetch.ts`:
- Around line 160-163: The Linux target ABI can end up undefined on non-Linux
hosts; update the target construction (the object assigned to args.target /
target) so that when target.os === "linux" you use detectLinuxAbi() but fall
back to the concrete string "gnu" if detectLinuxAbi() returns undefined (e.g.,
abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" : undefined); make the same
change in the other target creation site as well (the other block around lines
178-182) so cfg.abi is never left ambiguous.
- Around line 88-93: The config hard-codes zigCommit: ZIG_COMMIT which causes
macOS (darwin) non-CI hosts to cache the wrong Zig; change the assignment to
pick the host-default Zig commit (use ZIG_COMMIT_PARALLEL on non-CI darwin).
Update the object construction to set zigCommit = (process.platform === "darwin"
&& !ci) ? ZIG_COMMIT_PARALLEL : ZIG_COMMIT (or call the existing helper that
returns the default Zig commit), referencing the ZIG_COMMIT and
ZIG_COMMIT_PARALLEL symbols so local bun run prefetch will cache the correct
compiler.
In `@test/integration/build-prefetch/prefetch.test.ts`:
- Line 42: The strict assertion expect(stderr).toBe("") is brittle for spawned
Bun subprocesses; update the three occurrences that assert stderr (the
expect(stderr).toBe("") calls in prefetch.test.ts) to allow only whitespace or
empty output instead of an exact empty string — e.g., replace with a
pattern/assertion like expect(stderr).toMatch(/^\s*$/) (or an equivalent "no
non-whitespace output" check) for the stderr variable so ASAN sanitizer warnings
(whitespace/newlines) won't make the tests flaky.
- Around line 21-25: The test currently writes a zero-byte cache entry for
"http://192.0.2.1/empty" instead of the URL under test, so the "ignore empty
cache entries" behavior is never exercised; update the calls to writeFileSync so
the zero-byte file is created for the same URL variable (url) used by the test
(use writeFileSync(join(cache, sha256(url)), "") ) and move the non-empty
payload to the sibling key (e.g., writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "PAYLOAD")) so sha256, cache, url and
writeFileSync are the correct references to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22c004d4-5c41-495b-a9a0-23c1003caedc
📒 Files selected for processing (10)
.buildkite/Dockerfilepackage.jsonscripts/bootstrap.ps1scripts/bootstrap.shscripts/build/deps/webkit.tsscripts/build/download.tsscripts/build/prefetch.tsscripts/build/zig.tsscripts/machine.mjstest/integration/build-prefetch/prefetch.test.ts
| if (size !== undefined && size > 0) { | ||
| console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`); | ||
| const tmpPath = `${dest}.${process.pid}.partial`; | ||
| await copyFile(cached, tmpPath); | ||
| await rename(tmpPath, dest); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Handle cache-copy failures by falling back to network fetch.
If statSync() succeeds but copyFile()/rename() fails (e.g., transient FS lock/permission race), the function throws immediately instead of using the existing network retry path. This makes cache corruption/lock issues fatal when they can be recoverable.
Suggested fix
- if (size !== undefined && size > 0) {
- console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
- const tmpPath = `${dest}.${process.pid}.partial`;
- await copyFile(cached, tmpPath);
- await rename(tmpPath, dest);
- return;
- }
+ if (size !== undefined && size > 0) {
+ const tmpPath = `${dest}.${process.pid}.partial`;
+ try {
+ console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`);
+ await copyFile(cached, tmpPath);
+ await rename(tmpPath, dest);
+ return;
+ } catch {
+ await rm(tmpPath, { force: true }).catch(() => {});
+ // Fall through to network path.
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/build/download.ts` around lines 79 - 85, The cache-copy block that
calls copyFile and rename (using tmpPath and dest) must not abort the whole
download on transient FS errors; wrap the copyFile(tmpPath) and rename(tmpPath,
dest) calls in a try/catch inside the same size !== undefined branch, and on
error log a warning including the error and the paths (tmpPath, cached, dest),
attempt to clean up the tmpPath if present, then continue execution (do not
return) so the existing network fetch/retry logic runs; keep the original
early-return only on successful copy/rename.
| webkitVersion: WEBKIT_VERSION, | ||
| nodejsVersion: NODEJS_VERSION, | ||
| zigCommit: ZIG_COMMIT, | ||
| ci: true, | ||
| cacheDir: "", | ||
| } as Config; |
There was a problem hiding this comment.
Use the host’s actual default Zig commit here.
zig.ts switches to ZIG_COMMIT_PARALLEL on non-CI darwin hosts. Hard-coding ZIG_COMMIT means bun run prefetch on macOS caches the wrong compiler, so the next build still downloads Zig.
Suggested fix
-import { ZIG_COMMIT, zigDownloadUrl } from "./zig.ts";
+import { defaultZigCommit, zigDownloadUrl } from "./zig.ts"; function urlConfig(t: PrefetchTarget, v: { debug: boolean; lto: boolean; asan: boolean; baseline: boolean }): Config {
const host = detectHost();
+ const ci = process.env.CI === "true";
const windows = t.os === "windows";
return {
os: t.os,
arch: t.arch,
abi: t.abi,
@@
webkit: "prebuilt",
webkitVersion: WEBKIT_VERSION,
nodejsVersion: NODEJS_VERSION,
- zigCommit: ZIG_COMMIT,
- ci: true,
+ zigCommit: defaultZigCommit(ci, host.os),
+ ci,
cacheDir: "",
} as Config;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/build/prefetch.ts` around lines 88 - 93, The config hard-codes
zigCommit: ZIG_COMMIT which causes macOS (darwin) non-CI hosts to cache the
wrong Zig; change the assignment to pick the host-default Zig commit (use
ZIG_COMMIT_PARALLEL on non-CI darwin). Update the object construction to set
zigCommit = (process.platform === "darwin" && !ci) ? ZIG_COMMIT_PARALLEL :
ZIG_COMMIT (or call the existing helper that returns the default Zig commit),
referencing the ZIG_COMMIT and ZIG_COMMIT_PARALLEL symbols so local bun run
prefetch will cache the correct compiler.
| out: process.env[DEPS_CACHE_ENV] ?? join(process.cwd(), ".cache", "bun-deps"), | ||
| print: false, | ||
| target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() : undefined }, | ||
| webkitVariants: ["lto", "none"], |
There was a problem hiding this comment.
Default --os=linux to a concrete ABI.
From a macOS/Windows host, --os=linux leaves args.target.abi as undefined even though the CLI says the default is gnu. That makes the Linux target ambiguous for any URL builder that keys off cfg.abi.
Suggested fix
case "--os":
if (val !== "linux" && val !== "darwin" && val !== "windows") throw new BuildError(`--os: ${val}`);
args.target.os = val;
- if (val !== "linux") args.target.abi = undefined;
+ if (val === "linux") {
+ args.target.abi ??= "gnu";
+ } else {
+ args.target.abi = undefined;
+ }
break;Also applies to: 178-182
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/build/prefetch.ts` around lines 160 - 163, The Linux target ABI can
end up undefined on non-Linux hosts; update the target construction (the object
assigned to args.target / target) so that when target.os === "linux" you use
detectLinuxAbi() but fall back to the concrete string "gnu" if detectLinuxAbi()
returns undefined (e.g., abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" :
undefined); make the same change in the other target creation site as well (the
other block around lines 178-182) so cfg.abi is never left ambiguous.
| // Unroutable URL — if the cache lookup is skipped, fetch hangs/errors and | ||
| // the test fails. The 0-byte sibling proves we ignore empty cache entries. | ||
| const url = "http://192.0.2.1/dep-v1.tar.gz"; | ||
| writeFileSync(join(cache, sha256(url)), "PAYLOAD"); | ||
| writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), ""); |
There was a problem hiding this comment.
This test does not actually verify zero-byte cache-entry fallback.
Line 25 writes a zero-byte entry for a different URL than the one downloaded on Line 23, so the “ignore empty cache entries” claim is not exercised here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integration/build-prefetch/prefetch.test.ts` around lines 21 - 25, The
test currently writes a zero-byte cache entry for "http://192.0.2.1/empty"
instead of the URL under test, so the "ignore empty cache entries" behavior is
never exercised; update the calls to writeFileSync so the zero-byte file is
created for the same URL variable (url) used by the test (use
writeFileSync(join(cache, sha256(url)), "") ) and move the non-empty payload to
the sibling key (e.g., writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "PAYLOAD")) so sha256, cache, url and
writeFileSync are the correct references to locate the change.
| }); | ||
| const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); | ||
|
|
||
| expect(stderr).toBe(""); |
There was a problem hiding this comment.
Avoid strict empty-stderr assertions for spawned Bun subprocesses.
expect(stderr).toBe("") is brittle on ASAN targets where Bun subprocesses can emit sanitizer warnings despite successful behavior.
Suggested fix
- expect(stderr).toBe("");
expect(stdout).toContain("ok");
expect(readFileSync(join(out, "got"), "utf8")).toBe("PAYLOAD");
expect(exitCode).toBe(0);- expect(stderr).toBe("");
expect(readFileSync(dest, "utf8")).toBe("FROM-NETWORK");
expect(exitCode).toBe(0);- expect(stderr).toBe("");
const lines = stdout.trim().split("\n");Also applies to: 74-74, 97-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integration/build-prefetch/prefetch.test.ts` at line 42, The strict
assertion expect(stderr).toBe("") is brittle for spawned Bun subprocesses;
update the three occurrences that assert stderr (the expect(stderr).toBe("")
calls in prefetch.test.ts) to allow only whitespace or empty output instead of
an exact empty string — e.g., replace with a pattern/assertion like
expect(stderr).toMatch(/^\s*$/) (or an equivalent "no non-whitespace output"
check) for the stderr variable so ASAN sanitizer warnings (whitespace/newlines)
won't make the tests flaky.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
scripts/machine.mjs:1415-1429— The temp directory created bymkdtempSyncforbun-prefetch.tgz(line ~1423 ofscripts/machine.mjs) is never deleted after use — neither theprefetchPathfile nor its parenttmpDirare removed after upload, so they accumulate in the OS temp dir across repeated invocations. Additionally, in the Docker code path (features?.includes('docker')), the tarball is created unconditionally but never uploaded to the machine, making that temp dir andtar czfentirely wasted.Extended reasoning...
Bug summary: When
bootstrapis true inmain(), a temp directory is created viamkdtempSync(join(tmpdir(), "prefetch-")), and a tarballbun-prefetch.tgzis written into it. The variabletmpDiris scoped to a one-off block{}, so after the block exits, onlyprefetchPath(pointing intotmpDir) remains accessible. After the upload, neitherprefetchPathnor its parent directory are ever removed. Thefinallyblock at the end ofmain()only callsmachine.close(), with no local temp file cleanup.Two concrete problems introduced by this PR:
- Non-Docker paths (Windows + Linux non-Docker):
prefetchPathis uploaded viamachine.upload()and then abandoned. The local tgz (typically a few MB ofscripts/build/TypeScript) and its parenttmp/prefetch-XXXXXX/directory remain on disk indefinitely. - Docker path (
features?.includes("docker")): The tgz is created unconditionally in the block before theif (ci)/ Docker branching, but in the Dockerelse if (dockerfilePath)branch (lines ~1553–1566), onlybootstrapPath,dockerfilePath, andagentPathare uploaded —prefetchPathis never uploaded. Thetar czfI/O and the resulting file are entirely wasted.
Step-by-step proof:
- User runs
machine.mjs ssh --feature=docker --cloud=aws ... bootstrap = true, so the block executes:tmpDir = mkdtempSync(".../prefetch-abc123"),prefetchPath = ".../prefetch-abc123/bun-prefetch.tgz", tarball written.features.includes("docker")is true, sodockerfilePathis set.- Execution enters the Docker
else if (dockerfilePath)branch, which uploadsbootstrapPath,Dockerfile, andagentPath— noprefetchPathupload. main()exits (or errors); thefinallyblock callsmachine.close().tmpDirandprefetchPathremain in/tmp(or%TEMP%).
Why existing code doesn't prevent it: The pre-existing
agentPathtemp dir (created withmkdtempSync(join(tmpdir(), "agent-"))) has the same leak pattern — this PR follows that convention. However, the Docker path waste is new:prefetchPathis unconditionally created before any feature-checking, so the Docker path silently does wasted work.Fix: wrap the
tmpDirlifecycle in atry/finally(or use anAsyncDisposablepattern) that callsfs.rmSync(tmpDir, { recursive: true, force: true })after the upload succeeds, and also guard the tarball creation so it is skipped entirely when the Docker path is active (since Docker handles prefetch via theCOPY scripts/buildlayer in the Dockerfile itself). - Non-Docker paths (Windows + Linux non-Docker):
| } | ||
| } | ||
|
|
||
| function Prefetch-BuildDeps { | ||
| # machine.mjs ships scripts/build/ here. Absent → no-op (e.g. running | ||
| # bootstrap.ps1 standalone). Builds fall through to the network on cache | ||
| # miss, so a stale image after a dep bump still works. | ||
| $prefetchTgz = "C:\Windows\Temp\bun-prefetch.tgz" | ||
| if (-not (Test-Path $prefetchTgz)) { | ||
| return | ||
| } | ||
|
|
||
| $depsCache = "C:\bun-deps" | ||
| $prefetchDir = "C:\Windows\Temp\bun-prefetch" | ||
| New-Item -Path $prefetchDir -ItemType Directory -Force | Out-Null | ||
| & tar xzf $prefetchTgz -C $prefetchDir | ||
|
|
||
| $bun = Which bun -Required | ||
| & $bun "$prefetchDir\build\prefetch.ts" --out=$depsCache --webkit=lto,none,asan,baseline | ||
| if ($LASTEXITCODE -ne 0) { throw "prefetch.ts failed" } | ||
| Get-Content "$depsCache\manifest.json" | ||
|
|
||
| Set-Env "BUN_DEPS_CACHE_DIR" $depsCache | ||
| } | ||
|
|
||
| function Install-Rust { | ||
| if (Which rustc) { |
There was a problem hiding this comment.
🟡 In Prefetch-BuildDeps, neither $prefetchTgz (C:\Windows\Temp\bun-prefetch.tgz) nor $prefetchDir (C:\Windows\Temp\bun-prefetch) is removed after prefetch.ts finishes, so both get baked into any Windows CI snapshot that skips Sysprep (e.g. AWS stop+createImage). Add Remove-Item -Recurse -Force $prefetchDir, $prefetchTgz at the end of the function to match how the Dockerfile and bootstrap.sh handle the same cleanup.
Extended reasoning...
What the bug is and how it manifests
Prefetch-BuildDeps in scripts/bootstrap.ps1 (lines 406-432) creates two temporary paths — $prefetchTgz = C:\Windows\Temp\bun-prefetch.tgz and $prefetchDir = C:\Windows\Temp\bun-prefetch — then never removes them. After prefetch.ts completes and the dep tarballs are written to C:\bun-deps, the extracted TypeScript build scripts (scripts/build/) and the original archive remain on disk permanently.
The specific code path that triggers it
machine.mjs uploads bun-prefetch.tgz to C:\Windows\Temp\bun-prefetch.tgz before invoking bootstrap.ps1. Inside the function, tar xzf extracts the archive to $prefetchDir, then bun prefetch.ts runs and writes dep archives to C:\bun-deps. The function ends with Set-Env "BUN_DEPS_CACHE_DIR" $depsCache and returns — no cleanup.
Why existing code does not prevent it
The Dockerfile handles this correctly with rm -rf /tmp/prefetch in the same RUN layer (so it is never committed to the image layer). bootstrap.sh uses create_tmp_directory (mktemp) for the prefetch dir, and then clean_system() at the end of main() runs rm -rf /tmp/* /var/tmp/*, catching it. bootstrap.ps1 has no equivalent global temp-cleanup step, and the prefetch dir uses a fixed well-known path rather than a temp path — so nothing else removes it.
Impact
For Azure/Packer image builds, Sysprep cleans C:\Windows\Temp, so the leftover files do not survive into the published gallery image. However, AWS snapshots work by stopping the instance and calling createImage with no Sysprep, meaning the files are preserved in the snapshot. The files themselves are small (TypeScript source from scripts/build/), not the 250 MB dep tarballs, so size inflation is modest — but it is a genuine, avoidable inconsistency with all other bootstrap paths.
How to fix it
Add a single line at the end of Prefetch-BuildDeps, after Set-Env "BUN_DEPS_CACHE_DIR" $depsCache:
Remove-Item -Recurse -Force $prefetchDir, $prefetchTgz
Step-by-step proof
- machine.mjs calls machine.upload(prefetchPath, "C:\Windows\Temp\bun-prefetch.tgz") — the archive lands at C:\Windows\Temp\bun-prefetch.tgz.
- Prefetch-BuildDeps runs tar xzf C:\Windows\Temp\bun-prefetch.tgz -C C:\Windows\Temp\bun-prefetch — C:\Windows\Temp\bun-prefetch\build*.ts now exists.
- bun prefetch.ts runs successfully; C:\bun-deps is populated.
- Set-Env "BUN_DEPS_CACHE_DIR" C:\bun-deps is called; function returns.
- No code removes C:\Windows\Temp\bun-prefetch or C:\Windows\Temp\bun-prefetch.tgz.
- machine.snapshot(label) calls aws.stopInstances + aws.createImage — the image is captured with both paths still on disk.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (5)
test/integration/build-prefetch/prefetch.test.ts (2)
21-25:⚠️ Potential issue | 🟡 MinorThis fixture doesn't cover empty-entry fallback.
The zero-byte file is created for a different URL, so this test only proves the happy-path cache hit. If you want to cover “ignore empty cache entries,” seed the empty file under
sha256(url)and fetch that URL from a local server, or drop the claim from the comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integration/build-prefetch/prefetch.test.ts` around lines 21 - 25, The comment claims we test "ignore empty cache entries" but the zero-byte file is written for a different key; update the fixture so the empty sibling is written at the same cache key as the tested URL or remove the misleading claim: either replace writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "") with writeFileSync(join(cache, sha256(url)), "") to seed an empty-entry fallback for the same URL (and ensure the test fetches that URL from a local server), or keep the existing PAYLOAD seed and change the comment to only claim a happy-path cache hit; check usages of url, sha256, writeFileSync and cache when applying the change.
42-42:⚠️ Potential issue | 🟠 MajorDrop the exact empty-stderr assertions here.
Spawned Bun subprocesses can emit ASAN warnings on stderr even when the behavior is correct, so these checks make the suite flaky. The exit code already gives you the failure signal.
Suggested fix
- expect(stderr).toBe(""); expect(stdout).toContain("ok"); expect(readFileSync(join(out, "got"), "utf8")).toBe("PAYLOAD"); expect(exitCode).toBe(0); @@ - expect(stderr).toBe(""); expect(readFileSync(dest, "utf8")).toBe("FROM-NETWORK"); expect(exitCode).toBe(0); @@ - expect(stderr).toBe(""); const lines = stdout.trim().split("\n");Based on learnings: ASAN CI can emit warning text on stderr for spawned Bun subprocesses, so empty-stderr assertions are flaky.
Also applies to: 74-74, 97-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integration/build-prefetch/prefetch.test.ts` at line 42, The test contains brittle exact-empty stderr assertions (expect(stderr).toBe("")); remove those assertions (the exact-empty checks at the occurrences of expect(stderr).toBe("");) so the tests no longer fail on ASAN warnings from spawned Bun subprocesses; rely on the subprocess exit code/assertions already present instead of asserting stderr is exactly empty.scripts/build/download.ts (1)
79-84:⚠️ Potential issue | 🟠 MajorDon't let cache-copy failures bypass the network fallback.
If
statSync()finds a populated cache entry butcopyFile()orrename()fails, the whole download aborts instead of falling through to the existing retry/network path. That turns transient AV/lock/permission races into hard build failures.Suggested fix
if (size !== undefined && size > 0) { - console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`); const tmpPath = `${dest}.${process.pid}.partial`; - await copyFile(cached, tmpPath); - await rename(tmpPath, dest); - return; + try { + console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`); + await copyFile(cached, tmpPath); + await rename(tmpPath, dest); + return; + } catch (err) { + await rm(tmpPath, { force: true }).catch(() => {}); + console.warn(`warn: ignoring broken cache copy ${cached} -> ${dest}: ${err}`); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/build/download.ts` around lines 79 - 84, The current cache-copy block in scripts/build/download.ts returns immediately after copyFile/rename, which lets copy/rename failures abort the whole download; wrap the copyFile(tmpPath) and rename(tmpPath, dest) calls in a try/catch inside the if (size !== undefined && size > 0) branch (referencing tmpPath, cached, copyFile and rename) so that on error you log a warning (including the error), attempt to remove any leftover tmpPath, and then continue (do not return) so the existing network/retry path runs; only return after a successful rename.scripts/build/prefetch.ts (2)
65-93:⚠️ Potential issue | 🟠 MajorUse the host's actual default Zig commit here.
urlConfig()always bakeszigCommit: ZIG_COMMITwithci: true, sobun scripts/build/prefetch.tson a non-CI darwin host prefetches the wrong compiler archive. The next build still downloads Zig becausezig.tschooses a different default there.Suggested fix
-import { ZIG_COMMIT, zigDownloadUrl } from "./zig.ts"; +import { defaultZigCommit, zigDownloadUrl } from "./zig.ts"; function urlConfig(t: PrefetchTarget, v: { debug: boolean; lto: boolean; asan: boolean; baseline: boolean }): Config { const host = detectHost(); + const ci = process.env.CI === "true"; const windows = t.os === "windows"; return { @@ - zigCommit: ZIG_COMMIT, - ci: true, + zigCommit: defaultZigCommit(ci, host.os), + ci, cacheDir: "", } as Config; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/build/prefetch.ts` around lines 65 - 93, urlConfig currently hardcodes ci: true and zigCommit: ZIG_COMMIT which causes non-CI hosts (e.g., local darwin) to prefetch the wrong Zig archive; change urlConfig to use the detected host's CI flag and default Zig commit instead of always assuming CI: set ci to host.ci and set zigCommit to host.ci ? ZIG_COMMIT : host.zigCommit (or host.defaultZigCommit if that property exists on the object returned by detectHost()), so local runs use the host-provided zig commit while CI continues to use ZIG_COMMIT; update the urlConfig implementation accordingly.
160-163:⚠️ Potential issue | 🟡 MinorDefault Linux targets to a concrete ABI.
From darwin/windows,
--os=linuxleavesargs.target.abiundefined even though the help text says the default isgnu. Any URL builder that keys offcfg.abinow gets an ambiguous target.Suggested fix
const args: Args = { out: process.env[DEPS_CACHE_ENV] ?? join(process.cwd(), ".cache", "bun-deps"), print: false, - target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() : undefined }, + target: { os: host.os, arch: host.arch, abi: host.os === "linux" ? detectLinuxAbi() ?? "gnu" : undefined }, @@ case "--os": if (val !== "linux" && val !== "darwin" && val !== "windows") throw new BuildError(`--os: ${val}`); args.target.os = val; - if (val !== "linux") args.target.abi = undefined; + if (val === "linux") { + args.target.abi ??= "gnu"; + } else { + args.target.abi = undefined; + } break;Also applies to: 178-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/build/prefetch.ts` around lines 160 - 163, The target ABI is computed from host.os so when callers set target.os to "linux" from a different host the abi remains undefined; update the target object creation (the "target: { os: host.os, arch: host.arch, abi: host.os === \"linux\" ? detectLinuxAbi() : undefined }" expression) to compute abi from the target's os value instead of host.os — e.g., determine the assigned targetOs (the value you place in target.os) and set abi to detectLinuxAbi() when that targetOs === "linux" (falling back to the concrete default like "gnu" if detectLinuxAbi() cannot determine one); apply the same change to the other occurrence around lines 178-181.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/build/download.ts`:
- Around line 104-106: The current logic marks all 4xx responses as permanent by
setting "permanent = res.status >= 400 && res.status < 500", which incorrectly
treats transient 429 and 408 as non-retriable; update the condition in the
download retry logic (the variable named "permanent" in
scripts/build/download.ts that checks res.status) so that 4xx responses are
considered permanent except for 408 and 429 (i.e., treat 408 and 429 as
transient/retriable), keeping the rest of the 5xx/network retry behavior
unchanged.
In `@scripts/machine.mjs`:
- Around line 1239-1240: The packer var "prefetch_tarball" is being added
unconditionally even when prefetchPath is undefined; update the code that builds
the args array (used by buildWindowsImageWithPacker()) to only include the
"-var", `prefetch_tarball=${prefetchPath}` pair when prefetchPath is truthy
(i.e., set inside the bootstrap block). Locate usages of prefetchPath and the
args array construction and guard the push/concatenation so Packer isn't passed
prefetch_tarball=undefined.
In `@scripts/packer/windows-arm64.pkr.hcl`:
- Around line 86-90: The file provisioner unconditionally uploads
var.prefetch_tarball which defaults to "" and causes failures when unset; modify
the provisioner block that sets source = var.prefetch_tarball (the provisioner
"file" that writes to "C:\\Windows\\Temp\\bun-prefetch.tgz") so it only runs
when var.prefetch_tarball is non-empty (e.g. guard the block with a conditional
or use a count/conditional expression to include the provisioner only if
var.prefetch_tarball != ""); keep scripts/bootstrap.ps1 behavior unchanged so
bootstrap can handle the missing tarball.
In `@scripts/packer/windows-x64.pkr.hcl`:
- Around line 87-91: The file provisioner currently always uploads
var.prefetch_tarball causing failures when it's empty; make the upload
conditional by adding a count meta-argument to the provisioner so it only runs
when var.prefetch_tarball is non-empty (e.g. add count = var.prefetch_tarball !=
"" ? 1 : 0 inside the provisioner "file" block that references
var.prefetch_tarball and destination "C:\\Windows\\Temp\\bun-prefetch.tgz"),
leaving scripts/bootstrap.ps1's existing absent-file handling unchanged.
---
Duplicate comments:
In `@scripts/build/download.ts`:
- Around line 79-84: The current cache-copy block in scripts/build/download.ts
returns immediately after copyFile/rename, which lets copy/rename failures abort
the whole download; wrap the copyFile(tmpPath) and rename(tmpPath, dest) calls
in a try/catch inside the if (size !== undefined && size > 0) branch
(referencing tmpPath, cached, copyFile and rename) so that on error you log a
warning (including the error), attempt to remove any leftover tmpPath, and then
continue (do not return) so the existing network/retry path runs; only return
after a successful rename.
In `@scripts/build/prefetch.ts`:
- Around line 65-93: urlConfig currently hardcodes ci: true and zigCommit:
ZIG_COMMIT which causes non-CI hosts (e.g., local darwin) to prefetch the wrong
Zig archive; change urlConfig to use the detected host's CI flag and default Zig
commit instead of always assuming CI: set ci to host.ci and set zigCommit to
host.ci ? ZIG_COMMIT : host.zigCommit (or host.defaultZigCommit if that property
exists on the object returned by detectHost()), so local runs use the
host-provided zig commit while CI continues to use ZIG_COMMIT; update the
urlConfig implementation accordingly.
- Around line 160-163: The target ABI is computed from host.os so when callers
set target.os to "linux" from a different host the abi remains undefined; update
the target object creation (the "target: { os: host.os, arch: host.arch, abi:
host.os === \"linux\" ? detectLinuxAbi() : undefined }" expression) to compute
abi from the target's os value instead of host.os — e.g., determine the assigned
targetOs (the value you place in target.os) and set abi to detectLinuxAbi() when
that targetOs === "linux" (falling back to the concrete default like "gnu" if
detectLinuxAbi() cannot determine one); apply the same change to the other
occurrence around lines 178-181.
In `@test/integration/build-prefetch/prefetch.test.ts`:
- Around line 21-25: The comment claims we test "ignore empty cache entries" but
the zero-byte file is written for a different key; update the fixture so the
empty sibling is written at the same cache key as the tested URL or remove the
misleading claim: either replace writeFileSync(join(cache,
sha256("http://192.0.2.1/empty")), "") with writeFileSync(join(cache,
sha256(url)), "") to seed an empty-entry fallback for the same URL (and ensure
the test fetches that URL from a local server), or keep the existing PAYLOAD
seed and change the comment to only claim a happy-path cache hit; check usages
of url, sha256, writeFileSync and cache when applying the change.
- Line 42: The test contains brittle exact-empty stderr assertions
(expect(stderr).toBe("")); remove those assertions (the exact-empty checks at
the occurrences of expect(stderr).toBe("");) so the tests no longer fail on ASAN
warnings from spawned Bun subprocesses; rely on the subprocess exit
code/assertions already present instead of asserting stderr is exactly empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b49e530f-4f04-4b4d-9076-ea1242838b5e
📒 Files selected for processing (8)
.buildkite/Dockerfile-bootstrap.shscripts/build/download.tsscripts/build/prefetch.tsscripts/machine.mjsscripts/packer/variables.pkr.hclscripts/packer/windows-arm64.pkr.hclscripts/packer/windows-x64.pkr.hcltest/integration/build-prefetch/prefetch.test.ts
| // 4xx is permanent (wrong URL / asset not published) — retrying just | ||
| // burns the backoff budget. Only 5xx/network errors get another shot. | ||
| permanent = res.status >= 400 && res.status < 500; |
There was a problem hiding this comment.
Keep 429/408 on the retry path.
This now classifies every 4xx as permanent. GitHub and nodejs.org can return transient 429 Too Many Requests or 408 Request Timeout, so zig/node-header/WebKit downloads will fail on the first blip instead of backing off. Only clearly permanent misses should short-circuit.
Suggested fix
- permanent = res.status >= 400 && res.status < 500;
+ permanent = res.status >= 400 && res.status < 500 && res.status !== 408 && res.status !== 429;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/build/download.ts` around lines 104 - 106, The current logic marks
all 4xx responses as permanent by setting "permanent = res.status >= 400 &&
res.status < 500", which incorrectly treats transient 429 and 408 as
non-retriable; update the condition in the download retry logic (the variable
named "permanent" in scripts/build/download.ts that checks res.status) so that
4xx responses are considered permanent except for 408 and 429 (i.e., treat 408
and 429 as transient/retriable), keeping the rest of the 5xx/network retry
behavior unchanged.
| "-var", | ||
| `prefetch_tarball=${prefetchPath}`, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the area around lines 1239-1240 in scripts/machine.mjs
# to understand the full context
wc -l scripts/machine.mjsRepository: oven-sh/bun
Length of output: 79
🏁 Script executed:
# Get the context around lines 1239-1240
sed -n '1220,1260p' scripts/machine.mjs | cat -nRepository: oven-sh/bun
Length of output: 1316
🏁 Script executed:
# Find where prefetchPath is assigned
rg 'prefetchPath\s*=' scripts/machine.mjs -nRepository: oven-sh/bun
Length of output: 114
🏁 Script executed:
# Look for the bootstrap block and bootstrap flag handling
rg 'bootstrap|prefetchPath' scripts/machine.mjs -n -B 2 -A 2Repository: oven-sh/bun
Length of output: 5766
🏁 Script executed:
# Find where buildWindowsImageWithPacker is called and under what conditions
rg 'buildWindowsImageWithPacker' scripts/machine.mjs -n -B 5 -A 5Repository: oven-sh/bun
Length of output: 1306
🏁 Script executed:
# Look at the Packer template to see how prefetch_tarball is used
find . -name "*.pkr.hcl" -o -name "*.json" | grep -i packer | head -20Repository: oven-sh/bun
Length of output: 165
🏁 Script executed:
# Search for Packer template files
fd -e hcl -e json . -path "*packer*"Repository: oven-sh/bun
Length of output: 223
🏁 Script executed:
# Look at the Packer directory structure
ls -la scripts/machine/packer/ 2>/dev/null || ls -la scripts/packer/ 2>/dev/null || find scripts -type d -name packerRepository: oven-sh/bun
Length of output: 373
🏁 Script executed:
# Check the variables definition to see if prefetch_tarball is required
cat scripts/packer/variables.pkr.hclRepository: oven-sh/bun
Length of output: 1677
🏁 Script executed:
# Check the windows-x64 template to see how prefetch_tarball is used
cat scripts/packer/windows-x64.pkr.hclRepository: oven-sh/bun
Length of output: 6018
🏁 Script executed:
# Check if there's any conditional logic based on prefetch_tarball
rg 'prefetch_tarball' scripts/packer/Repository: oven-sh/bun
Length of output: 267
🏁 Script executed:
# Let's verify what happens when we pass undefined as a string to Packer
# Check if there's any recent Packer execution logs or error handling
rg 'prefetch|undefined' scripts/machine.mjs -A 3 -B 3 | head -60Repository: oven-sh/bun
Length of output: 1748
🏁 Script executed:
# Verify the exact flow: when bootstrap=false, is prefetchPath ever assigned?
sed -n '1380,1460p' scripts/machine.mjs | cat -nRepository: oven-sh/bun
Length of output: 3507
Only pass prefetch_tarball when a tarball was actually generated.
prefetchPath is assigned only inside the bootstrap block, but Azure Windows image builds still call buildWindowsImageWithPacker() when --no-bootstrap is used. In that flow, passing -var prefetch_tarball=undefined causes Packer's file provisioner to fail when trying to upload a non-existent source file.
Suggested fix
"-var",
`agent_script=${agentPath}`,
- "-var",
- `prefetch_tarball=${prefetchPath}`,
+ ...(prefetchPath ? ["-var", `prefetch_tarball=${prefetchPath}`] : []),
templateDir,
];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "-var", | |
| `prefetch_tarball=${prefetchPath}`, | |
| "-var", | |
| `agent_script=${agentPath}`, | |
| ...(prefetchPath ? ["-var", `prefetch_tarball=${prefetchPath}`] : []), | |
| templateDir, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/machine.mjs` around lines 1239 - 1240, The packer var
"prefetch_tarball" is being added unconditionally even when prefetchPath is
undefined; update the code that builds the args array (used by
buildWindowsImageWithPacker()) to only include the "-var",
`prefetch_tarball=${prefetchPath}` pair when prefetchPath is truthy (i.e., set
inside the bootstrap block). Locate usages of prefetchPath and the args array
construction and guard the push/concatenation so Packer isn't passed
prefetch_tarball=undefined.
| // Step 0: Upload scripts/build/ so bootstrap's Prefetch-BuildDeps can run. | ||
| provisioner "file" { | ||
| source = var.prefetch_tarball | ||
| destination = "C:\\Windows\\Temp\\bun-prefetch.tgz" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "variables.pkr.hcl" -type fRepository: oven-sh/bun
Length of output: 89
🏁 Script executed:
find . -name "bootstrap.ps1" -type fRepository: oven-sh/bun
Length of output: 78
🏁 Script executed:
find . -name "windows-*.pkr.hcl" -type fRepository: oven-sh/bun
Length of output: 130
🏁 Script executed:
cat ./scripts/packer/variables.pkr.hclRepository: oven-sh/bun
Length of output: 1677
🏁 Script executed:
cat ./scripts/bootstrap.ps1Repository: oven-sh/bun
Length of output: 22951
🏁 Script executed:
sed -n '80,95p' ./scripts/packer/windows-x64.pkr.hclRepository: oven-sh/bun
Length of output: 479
Guard this upload when no prefetch tarball was provided.
prefetch_tarball defaults to "" in scripts/packer/variables.pkr.hcl, and scripts/bootstrap.ps1 treats the tarball as optional. This unconditional file provisioner fails before bootstrap can take that fallback path when the variable is not explicitly set, breaking default or standalone Packer invocations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/packer/windows-arm64.pkr.hcl` around lines 86 - 90, The file
provisioner unconditionally uploads var.prefetch_tarball which defaults to ""
and causes failures when unset; modify the provisioner block that sets source =
var.prefetch_tarball (the provisioner "file" that writes to
"C:\\Windows\\Temp\\bun-prefetch.tgz") so it only runs when var.prefetch_tarball
is non-empty (e.g. guard the block with a conditional or use a count/conditional
expression to include the provisioner only if var.prefetch_tarball != ""); keep
scripts/bootstrap.ps1 behavior unchanged so bootstrap can handle the missing
tarball.
| // Step 0: Upload scripts/build/ so bootstrap's Prefetch-BuildDeps can run. | ||
| provisioner "file" { | ||
| source = var.prefetch_tarball | ||
| destination = "C:\\Windows\\Temp\\bun-prefetch.tgz" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "variables.pkr.hcl" | head -20Repository: oven-sh/bun
Length of output: 87
🏁 Script executed:
fd -t f "bootstrap.ps1" | head -20Repository: oven-sh/bun
Length of output: 76
🏁 Script executed:
fd -t f "windows-x64.pkr.hcl" | head -20Repository: oven-sh/bun
Length of output: 89
🏁 Script executed:
cat -n scripts/packer/variables.pkr.hcl | grep -A 5 -B 5 "prefetch_tarball"Repository: oven-sh/bun
Length of output: 479
🏁 Script executed:
cat -n scripts/bootstrap.ps1 | grep -A 10 -B 5 "prefetch" | head -40Repository: oven-sh/bun
Length of output: 1105
🏁 Script executed:
sed -n '80,100p' scripts/packer/windows-x64.pkr.hclRepository: oven-sh/bun
Length of output: 582
Guard this upload when no prefetch tarball was provided.
prefetch_tarball defaults to "" in scripts/packer/variables.pkr.hcl, and scripts/bootstrap.ps1 explicitly handles its absence (lines 413–416 return early if the file doesn't exist, with comment "Absent → no-op"). The unconditional file provisioner makes direct or legacy Packer invocations fail before bootstrap can take that fallback path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/packer/windows-x64.pkr.hcl` around lines 87 - 91, The file
provisioner currently always uploads var.prefetch_tarball causing failures when
it's empty; make the upload conditional by adding a count meta-argument to the
provisioner so it only runs when var.prefetch_tarball is non-empty (e.g. add
count = var.prefetch_tarball != "" ? 1 : 0 inside the provisioner "file" block
that references var.prefetch_tarball and destination
"C:\\Windows\\Temp\\bun-prefetch.tgz"), leaving scripts/bootstrap.ps1's existing
absent-file handling unchanged.
| if (size !== undefined && size > 0) { | ||
| console.log(`from ${DEPS_CACHE_ENV} (${(size / 1048576).toFixed(1)} MiB)`); | ||
| const tmpPath = `${dest}.${process.pid}.partial`; | ||
| await copyFile(cached, tmpPath); | ||
| await rename(tmpPath, dest); | ||
| return; | ||
| } |
There was a problem hiding this comment.
🔴 In the cache-hit path of downloadWithRetry() (lines 79–85 of scripts/build/download.ts), copyFile(cached, tmpPath) and rename(tmpPath, dest) are called with no error handling, so any transient FS error (disk full, AV/indexer lock on Windows, NFS stale handle) propagates as an unhandled exception that aborts the build entirely. The network retry path immediately below has a proper try/catch with partial-file cleanup for exactly this reason; the same protection is missing from the cache-hit path. Fix: wrap copyFile+rename in a try/catch, clean up the partial file on failure, and fall through to the network path.
Extended reasoning...
What the bug is and how it manifests
In scripts/build/download.ts lines 79-85, when BUN_DEPS_CACHE_DIR is set and a matching cache entry is found (size > 0), the function runs copyFile(cached, tmpPath) and rename(tmpPath, dest) with no try/catch. If either throws (disk full ENOSPC, transient permission race EACCES, AV/indexer file lock on Windows, NFS stale handle, or another process deleting the cached file between statSync and copyFile), the exception propagates out of downloadWithRetry as an unhandled rejection, aborting the build step entirely.
The specific code path that triggers it
statSync is already guarded by a try/catch (lines 73-74). But the guard stops there. If statSync succeeds (returns a non-zero size), execution enters the if (size !== undefined && size > 0) block where copyFile and rename run unprotected. A successful stat is no guarantee the copy will succeed: file locks, cross-device renames to a full temp partition, or a concurrent deletion can all cause these operations to fail after stat passes.
Why existing code does not prevent it
The network retry loop (lines 96-120) has an explicit try/catch that catches all errors from fetch, pipeline, and rename, swallows cleanup errors with .catch(() => {}), and loops to the next attempt. There is even a comment acknowledging the Windows AV/indexer problem: 'on Windows, AV/indexer can briefly lock the partial'. The cache-hit path above shares the same failure modes (the cached file can also be locked) but has no equivalent protection.
Impact
This is a regression introduced by this PR. Before this PR, all downloads went through the network retry path, which tolerates transient FS errors. Now, when BUN_DEPS_CACHE_DIR is set (the primary use case for this PR - it is baked into CI images), a transient FS error during the copy turns a recoverable condition into a hard build failure. On Windows, AV/indexer locking temp files is common enough that the existing code already has a comment about it. This will cause sporadic CI failures on Windows and on any platform with transient NFS or disk-space issues.
How to fix it
Wrap the copyFile+rename block in a try/catch, clean up the partial file on failure, and fall through to the network path:
if (size \!== undefined && size > 0) {
const tmpPath = dest + '.' + process.pid + '.partial';
try {
console.log('from BUN_DEPS_CACHE_DIR ...');
await copyFile(cached, tmpPath);
await rename(tmpPath, dest);
return;
} catch {
await rm(tmpPath, { force: true }).catch(() => {});
// Fall through to network path.
}
}
Step-by-step proof
- CI image is built with BUN_DEPS_CACHE_DIR=/opt/bun-deps set and 50 cached archives.
- A build starts; downloadWithRetry is called for boringssl.tar.gz.
- statSync(cached) succeeds, returns size = 52428800 (> 0).
- copyFile(cached, tmpPath) is called. An AV scanner has a brief lock on the cached file and throws EACCES.
- The EACCES exception propagates up through downloadWithRetry and crashes the build with an unhandled error.
- The network retry loop at line 96 - which would have downloaded the file in seconds - is never reached.
🔬 also observed by coderabbitai[bot]
| webkitVersion: WEBKIT_VERSION, | ||
| nodejsVersion: NODEJS_VERSION, | ||
| zigCommit: ZIG_COMMIT, | ||
| ci: true, | ||
| cacheDir: "", | ||
| } as Config; |
There was a problem hiding this comment.
🔴 In urlConfig(), zigCommit is hardcoded to ZIG_COMMIT and ci is hardcoded to true, but zig.ts's defaultZigCommit(false, 'darwin') returns ZIG_COMMIT_PARALLEL for local macOS builds. Since the zig download URL embeds the commit hash and the cache key is sha256(url), the URL cached by bun run prefetch on macOS differs from the URL the actual build requests — resulting in a guaranteed cache miss for the Zig compiler on every local macOS run. Fix by setting ci = process.env.CI === "true" and using defaultZigCommit(ci, host.os) instead of the hardcoded constant.
Extended reasoning...
What the bug is and how it manifests
urlConfig() in prefetch.ts (lines 88–93) hardcodes two values: zigCommit: ZIG_COMMIT and ci: true. These are then passed to zigDownloadUrl(cfg, safe) in zig.ts, which builds the URL as https://github.com/oven-sh/zig/releases/download/autobuild-${cfg.zigCommit}/bootstrap-<arch>-<os>-<abi>.zip. The offline cache key written by prefetch is sha256(url) (via offlineCacheKey()). At build time, downloadWithRetry() recomputes sha256(url) from the URL the build system resolves, and looks it up in BUN_DEPS_CACHE_DIR. If those two URLs differ, the cache entry is never found.
The specific code path that triggers it
zig.ts exports ZIG_COMMIT = '365343af...' and ZIG_COMMIT_PARALLEL = '445fc0cb...'. The function defaultZigCommit(ci, hostOs) returns ZIG_COMMIT when ci=true (or when hostOs \!== 'darwin'), and ZIG_COMMIT_PARALLEL when ci=false && hostOs === 'darwin'. When a macOS developer runs bun run prefetch locally (no CI env var), urlConfig() still forces ci: true, producing a URL with ZIG_COMMIT. But the actual build calls defaultZigCommit(false, 'darwin') (because CI is not set) and gets ZIG_COMMIT_PARALLEL, producing a URL with a different commit hash.
Why existing code doesn't prevent it
There is no guard that ensures urlConfig() uses the same commit selection logic as the build system. The hardcoding was presumably intended for CI image baking (where ci=true is always correct), but the function is also reached by bun run prefetch invoked from macOS developer machines, which the USAGE string explicitly documents as a supported workflow.
Proof via concrete example
- Developer on macOS (no CI env) runs
bun run prefetch. urlConfig()returnszigCommit: '365343af'(ZIG_COMMIT) becauseci: trueis hardcoded.zigDownloadUrlbuilds:.../autobuild-365343af.../bootstrap-x86_64-macos-none.zip- Cache key written:
sha256('.../autobuild-365343af.../...')→ file stored. - Developer runs the build. The build calls
defaultZigCommit(false, 'darwin')→ returns'445fc0cb'(ZIG_COMMIT_PARALLEL). - Build calls
zigDownloadUrlwithzigCommit: '445fc0cb'→ URL is.../autobuild-445fc0cb.../bootstrap-x86_64-macos-none.zip. - Cache lookup:
sha256('.../autobuild-445fc0cb.../...')→ no file → cache miss → Zig downloads from network.
Impact
For CI (the primary use case for image baking), there is no bug: both prefetch and the build use ci=true, so both select ZIG_COMMIT. For local macOS developers — the explicit secondary audience documented in the USAGE string — prefetch for the Zig compiler is entirely useless. The ~200MB compiler is re-downloaded on every build regardless of whether prefetch ran.
Fix
Replace the two hardcoded lines with:
const ci = process.env.CI === 'true';
// ...
zigCommit: defaultZigCommit(ci, host.os),
ci,And update the import from zig.ts to include defaultZigCommit.
🔬 also observed by coderabbitai
| provisioner "file" { | ||
| source = var.prefetch_tarball | ||
| destination = "C:\\Windows\\Temp\\bun-prefetch.tgz" | ||
| } |
There was a problem hiding this comment.
🔴 The Packer file provisioner with source = var.prefetch_tarball will fail immediately when Packer is invoked without -var prefetch_tarball=..., because Packer statically validates source paths and has no HCL-level conditional to skip a provisioner when a variable is empty. The empty default creates a false impression of optionality: bootstrap.ps1 guards against a missing file, but Packer fails before any provisioner runs. The same issue affects both windows-x64.pkr.hcl and windows-arm64.pkr.hcl (both line 88). In practice machine.mjs always supplies the value, but any direct packer build invocation (debugging, manual re-runs, future automation) fails immediately.
Extended reasoning...
What the bug is
Both windows-x64.pkr.hcl and windows-arm64.pkr.hcl add a provisioner "file" with source = var.prefetch_tarball, where var.prefetch_tarball has default = "". Packer validates the source path before any provisioner executes: stat("") fails, aborting the build before WinRM even connects to the VM.
The specific code path
variables.pkr.hcl defines variable "prefetch_tarball" { default = "" }. In windows-x64.pkr.hcl lines 88-91 (mirrored in windows-arm64.pkr.hcl):
provisioner "file" {
source = var.prefetch_tarball // resolves to "" by default
destination = "C:\Windows\Temp\bun-prefetch.tgz"
}
Packer evaluates all source paths at plan time. An empty string produces a fatal stat error and the build aborts.
Why existing code does not prevent it
bootstrap.ps1 contains if (-not (Test-Path prefetchTgz)) { return }, which gracefully handles the file being absent at runtime. However, Packer fails before any provisioner runs — the PowerShell guard is never reached. There is no HCL-level mechanism to skip a file provisioner based on an empty variable.
Addressing the refutation
The refutation correctly observes that this mirrors the pre-existing agent_script antipattern (same empty default, same file provisioner, no skip logic), and that machine.mjs always supplies the value for the normal CI path. Both points are accurate. However: (1) the presence of a prior bug does not justify a new instance; (2) the variable description says the tarball is optional and bootstrap handles its absence — that contract is honoured at the PowerShell layer but broken at the Packer layer, misleading anyone invoking Packer directly; and (3) prefetchPath is only set inside the if (bootstrap) block in machine.mjs, so a --no-bootstrap Azure invocation would pass undefined to buildWindowsImageWithPacker, causing Packer to receive the string "undefined" or an absent -var, either of which makes the problem worse.
Impact
Any packer build invocation without -var prefetch_tarball=... — direct debugging, manual re-runs, future automation not going through machine.mjs — fails immediately at source-path validation. The developer cannot reproduce or debug the image without also constructing and passing a local tarball path.
Step-by-step proof
- Developer runs: packer build -only azure-arm.windows-x64 scripts/packer
- No -var prefetch_tarball=... is supplied; var.prefetch_tarball resolves to "".
- Packer validates all provisioner source paths before connecting to the VM. stat("") fails.
- Build aborts. WinRM never connects. Bootstrap never runs.
- Developer cannot test the image locally without knowing they must also supply a path to a hand-crafted tarball.
Fix
Use a dynamic provisioner block conditioned on var.prefetch_tarball != "", or make the variable required by removing its default so the optionality claim is honest. The same fix is needed for the pre-existing agent_script provisioner.
| if (!res.ok || res.body === null) { | ||
| lastError = new BuildError(`HTTP ${res.status} ${res.statusText} for ${url}`); | ||
| continue; | ||
| // 4xx is permanent (wrong URL / asset not published) — retrying just | ||
| // burns the backoff budget. Only 5xx/network errors get another shot. | ||
| permanent = res.status >= 400 && res.status < 500; | ||
| } else { | ||
| // Cast: DOM ReadableStream vs node:stream/web ReadableStream — same | ||
| // shape at runtime, different TS lib declarations. | ||
| await pipeline(Readable.fromWeb(res.body as unknown as NodeWebReadable), createWriteStream(tmpPath)); |
There was a problem hiding this comment.
🔴 The new permanent shortcut at line 106 marks any 4xx response as non-retryable (res.status >= 400 && res.status < 500), which includes HTTP 429 Too Many Requests — a transient rate-limiting signal that should be retried after backoff. A single 429 from GitHub CDN now immediately breaks the retry loop and fails the build, where before this PR it would have consumed all five retry attempts as intended. Fix: permanent = res.status >= 400 && res.status < 500 && res.status !== 429.
Extended reasoning...
What the bug is and how it manifests
downloadWithRetry() now sets permanent = res.status >= 400 && res.status < 500 (lines 106–107) immediately after receiving any 4xx response, then breaks out of the retry loop at line 121. HTTP 429 Too Many Requests sits squarely in that range. A 429 response from GitHub CDN — which occurs during traffic bursts when downloading large artifacts like the ~200 MB WebKit prebuilt or Zig compiler — now terminates the download immediately with zero retries, producing a hard build failure.
The specific code path that triggers it
downloadWithRetry(url, dest, logPrefix)enters thefor (let attempt = 1; attempt <= 5; attempt++)loop.fetch()returns a Response withstatus === 429andok === false.- The
if (!res.ok)branch fires:lastErroris set andpermanentis assignedres.status >= 400 && res.status < 500→true. - The catch block is skipped (no exception was thrown — fetch itself succeeded).
- Immediately after the try/catch,
if (permanent) break;fires — the loop exits after the very first attempt. throw new BuildError("Failed to download after 5 attempts…")is reached after only one attempt.
Why existing code does not prevent it
Before this PR, the non-OK branch used continue unconditionally — always retrying. The PR intent was correct: genuine 4xx errors like 404 (asset not published) or 410 (gone) are permanent and burning four extra backoff slots is wasteful. The code comment even says "wrong URL / asset not published" — that characterization does not apply to 429, which is entirely about server-side rate limiting and is explicitly designed to be retried.
What the impact would be
This PR downloads large blobs (WebKit ~200 MiB, Zig ~80 MiB) concurrently across multiple CI workers all pointed at the same GitHub release endpoint. GitHub CDN returns 429 during traffic bursts. With this regression, any single 429 on any file aborts that dep download permanently in that build run, turning a previously-recoverable transient condition into a mandatory build failure. CI has no manual retry for individual dep fetches; the whole job must be re-queued.
How to fix it
Exclude 429 from the permanent range:
permanent = res.status >= 400 && res.status < 500 && res.status !== 429;
Optionally, also honour the Retry-After response header for the backoff delay, but the minimal fix is just the exclusion.
Step-by-step proof
downloadWithRetry("https://github.com/oven-sh/WebKit/releases/download/.../bun-webkit-linux-amd64-lto.tar.gz", dest, "WebKit")is called.- On attempt 1, GitHub CDN responds HTTP 429 Too Many Requests (ok === false).
lastError = new BuildError("HTTP 429 Too Many Requests for …").permanent = 429 >= 400 && 429 < 500→true.if (permanent) break;— loop exits immediately. Attempts 2–5 never run.- Function throws
BuildError: Failed to download after 5 attemptsafter a single network call and zero backoff delay. - Build fails. Before this PR, step 4 would have left
permanent = false(or the unconditionalcontinuepath ran), attempt 2 would fire after 2 s backoff, and the transient rate limit would have cleared.
| const url = "http://192.0.2.1/dep-v1.tar.gz"; | ||
| writeFileSync(join(cache, sha256(url)), "PAYLOAD"); | ||
| writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), ""); |
There was a problem hiding this comment.
🟡 The 'downloadWithRetry copies from cache on hit' test writes an empty cache entry for sha256("http://192.0.2.1/empty"), which is a completely different URL than the one actually downloaded (url = "http://192.0.2.1/dep-v1.tar.gz"), so the zero-byte-skip behavior is never exercised. The comment 'The 0-byte sibling proves we ignore empty cache entries' is incorrect; removing the size > 0 check from the production code would not cause this test to fail.
Extended reasoning...
What the bug is and how it manifests
In the test 'downloadWithRetry copies from cache on hit' (lines 21–44 of prefetch.test.ts), line 24 writes "PAYLOAD" for sha256(url) where url = "http://192.0.2.1/dep-v1.tar.gz", and line 25 writes an empty file for sha256("http://192.0.2.1/empty") — a completely different URL. The test then only calls downloadWithRetry(url, ...), which always finds the non-empty PAYLOAD entry. The zero-byte entry for the other URL is never looked up.
The specific code path that triggers it
downloadWithRetry computes cached = join(cacheDir, offlineCacheKey(url)) and calls statSync(cached).size. Since url maps to the PAYLOAD entry (size > 0), the cache-hit path succeeds immediately. The zero-byte entry for "http://192.0.2.1/empty" is never consulted during this test. The comment on line 25 — "The 0-byte sibling proves we ignore empty cache entries" — is factually wrong.
Why existing code doesn't prevent it
The test was presumably written with the intent of having a zero-byte entry for url itself, then relying on the size > 0 guard in download.ts (lines 79–85) to fall through to the network. But the two writeFileSync calls are swapped: the non-empty PAYLOAD is written for sha256(url) and the empty one is written for a different URL. Nothing in the test framework detects that the zero-byte fallthrough path is never reached.
What the impact would be
The production size > 0 check (which correctly refuses to feed a 0-byte file to tar) is not verified by this test suite. If someone removed or regressed that guard, all four test-suite passes (three verifiers + synthesis) would still succeed because the URL being tested always resolves to the non-empty PAYLOAD entry.
How to fix it
To actually test the stated behavior, swap the two entries: write the empty file for sha256(url) (so downloadWithRetry sees size === 0 and falls through to the network), and write the PAYLOAD for sha256("http://192.0.2.1/empty") or simply remove it. With size === 0 for url, the function should attempt a network fetch to the unroutable 192.0.2.1 address and fail — proving the fallthrough occurred. Alternatively, use a locally-controlled test server so the network fallthrough produces a verifiable response rather than a timeout.
Step-by-step proof
url = "http://192.0.2.1/dep-v1.tar.gz"- Line 24:
writeFileSync(join(cache, sha256(url)), "PAYLOAD")→ cache entry forurlhas content. - Line 25:
writeFileSync(join(cache, sha256("http://192.0.2.1/empty")), "")→ cache entry for a URL that is never requested has zero bytes. downloadWithRetry(url, ...)is called.cached = join(cacheDir, sha256(url))→ resolves to the PAYLOAD file.statSync(cached).size === 7("PAYLOAD" is 7 bytes) →size > 0is true → file is copied, function returns.- The zero-byte entry is never touched. Removing the
size > 0check does not change the outcome for this test.
Adds
bun run prefetchand a$BUN_DEPS_CACHE_DIRlookup indownloadWithRetry()so CI images can bake in the ~250MB of dep tarballs / prebuilt WebKit / zig that every build currently re-downloads.How it works
scripts/build/prefetch.tsenumerates every URL the build would fetch and saves each to<out>/<sha256(url)>plus amanifest.json. Runbun run prefetch --printto see the list.downloadWithRetry()checks$BUN_DEPS_CACHE_DIR/<sha256(url)>before the network. Miss → normal download. So a PR that bumps a dep on a stale image still builds — that one dep just downloads as before.CI image wiring
machine.mjsshipsscripts/build/as a tarball next to bootstrapbootstrap.sh/bootstrap.ps1extract it, run prefetch into/opt/bun-deps/C:\bun-deps, and setBUN_DEPS_CACHE_DIRin the buildkite environment hook (posix) / Machine env (windows).buildkite/Dockerfilegets the equivalent layerBootstrap versions bumped (sh 29→30, ps1 16→17). Not covered: cargo crates for lolhtml (needs the dep's Cargo.lock which doesn't exist pre-extraction).