dns.lookup: default to libc getaddrinfo on Linux#29231
Conversation
dns.lookup is specified to behave like getaddrinfo(3). Linux was defaulting to c-ares, which returned an extra ::1 entry for names that only had an A record in /etc/hosts (e.g. an entry like `127.0.0.1 mongo-1` would surface as `::1, 127.0.0.1` via Bun and as `127.0.0.1` via Node). With the default result order being `verbatim`, the callback form would then yield `::1` as the single address. Switch the Linux default to the libc (getaddrinfo) backend so that dns.lookup matches Node. dns.resolve* is unaffected and continues to use c-ares. Fixes #29227
|
Updated 5:12 PM PT - Apr 12th, 2026
❌ @autofix-ci[bot], your commit 352d212 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 29231That installs a local version of the PR into your bun-29231 --bun |
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
WalkthroughChanged the DNS getaddrinfo backend selection: on non-macOS/Windows platforms the default backend is now Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/29227.test.ts`:
- Around line 67-74: Remove the brittle assertion that stderr is exactly empty
and instead make exitCode the primary failure signal; delete or stop asserting
expect(realStderr).toBe("") (the variables involved are stderr and realStderr)
and ensure any stdout/behavior assertions occur before the final
expect(exitCode).toBe(0) assertion so exitCode is checked last; apply the same
change to the similar block around the other occurrence (lines referencing
realStderr/stderr/exitCode).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5f04d071-38b1-46b5-92e8-a5c68f359b2e
📒 Files selected for processing (2)
src/dns.zigtest/regression/issue/29227.test.ts
| // Filter out the ASAN warning that debug builds print to stderr. | ||
| const realStderr = stderr | ||
| .split("\n") | ||
| .filter(l => l && !l.includes("ASAN")) | ||
| .join("\n"); | ||
| expect(realStderr).toBe(""); | ||
| expect(exitCode).toBe(0); | ||
|
|
There was a problem hiding this comment.
Avoid empty-stderr assertion here; assert behavior first, then exitCode last.
expect(realStderr).toBe("") is brittle for regression subprocess tests, and exitCode is currently asserted before stdout-behavior assertions.
Suggested patch
- // Filter out the ASAN warning that debug builds print to stderr.
- const realStderr = stderr
- .split("\n")
- .filter(l => l && !l.includes("ASAN"))
- .join("\n");
- expect(realStderr).toBe("");
- expect(exitCode).toBe(0);
+ // Keep stderr available for debugging but avoid empty-stderr assertions.
+ const realStderr = stderr
+ .split("\n")
+ .filter(l => l && !l.includes("ASAN"))
+ .join("\n");
// Find the `all` array and the single-form line in the output.
const lines = stdout.trim().split("\n");
const allLine = lines.find(l => l.startsWith("["))!;
const singleLine = lines.find(l => l.startsWith("single:"))!;
expect(JSON.parse(allLine)).toEqual([{ address: "127.0.0.1", family: 4 }]);
expect(singleLine).toBe("single:127.0.0.1:4");
+ expect(exitCode).toBe(0);Based on learnings: in test/regression/issue/*.test.ts, avoid asserting spawned Bun subprocess stderr is empty; use exitCode as the primary failure signal.
As per coding guidelines: “Expect stdout assertions before exit code assertions … BEFORE expect(exitCode).toBe(0).”
Also applies to: 76-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/regression/issue/29227.test.ts` around lines 67 - 74, Remove the brittle
assertion that stderr is exactly empty and instead make exitCode the primary
failure signal; delete or stop asserting expect(realStderr).toBe("") (the
variables involved are stderr and realStderr) and ensure any stdout/behavior
assertions occur before the final expect(exitCode).toBe(0) assertion so exitCode
is checked last; apply the same change to the similar block around the other
occurrence (lines referencing realStderr/stderr/exitCode).
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/regression/issue/29227.test.ts (1)
63-77:⚠️ Potential issue | 🟠 MajorRemove brittle empty-
stderrassertion and assertexitCodelast.Line 68’s
expect(realStderr).toBe("")is flaky for regression subprocess tests, and Line 69 checksexitCodebefore validating stdout behavior (Lines 76-77).Suggested patch
- const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); - - // Filter out the ASAN warning that debug builds print to stderr. - const realStderr = stderr - .split("\n") - .filter(l => l && !l.includes("ASAN")) - .join("\n"); - expect(realStderr).toBe(""); - expect(exitCode).toBe(0); + const [stdout, _stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); @@ expect(JSON.parse(allLine)).toEqual([{ address: "127.0.0.1", family: 4 }]); expect(singleLine).toBe("single:127.0.0.1:4"); + expect(exitCode).toBe(0);Based on learnings: in
test/regression/issue/*.test.ts, avoid asserting spawned Bun subprocessstderris empty and useexitCodeas the primary failure signal; as per coding guidelines, assert stdout behavior beforeexpect(exitCode).toBe(0).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/regression/issue/29227.test.ts` around lines 63 - 77, The test currently asserts realStderr is empty and checks exitCode before validating stdout, which is flaky; remove the brittle assertion expect(realStderr).toBe("") entirely, then reorder the assertions so that stdout parsing and checks (the JSON parse of allLine and the singleLine equality checks using stdout/lines/allLine/singleLine) run before asserting expect(exitCode).toBe(0), keeping the ASAN filtering logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/regression/issue/29227.test.ts`:
- Around line 63-77: The test currently asserts realStderr is empty and checks
exitCode before validating stdout, which is flaky; remove the brittle assertion
expect(realStderr).toBe("") entirely, then reorder the assertions so that stdout
parsing and checks (the JSON parse of allLine and the singleLine equality checks
using stdout/lines/allLine/singleLine) run before asserting
expect(exitCode).toBe(0), keeping the ASAN filtering logic intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: aa97425b-729d-4827-8e50-beeb18eb787f
📒 Files selected for processing (1)
test/regression/issue/29227.test.ts
There was a problem hiding this comment.
LGTM — straightforward one-line default switch to fix Node.js spec compliance on Linux.
Extended reasoning...
Overview
The PR changes a single enum value in GetAddrInfo.Backend.default (src/dns.zig): on Linux the default backend for dns.lookup switches from .c_ares to .libc. A detailed comment block explains the rationale. A new regression test (test/regression/issue/29227.test.ts) verifies the fix with a random /etc/hosts entry, with proper cleanup in a finally block.
Security Risks
None. This is a DNS resolver backend selection change with no auth, crypto, or privilege implications. The libc path was already exercised when users passed { backend: 'libc' } explicitly.
Level of Scrutiny
Low. The diff is effectively one logic line changed plus a comment block. The .libc code path (LibC.lookup → std.c.getaddrinfo) is pre-existing and was already the explicit opt-in path; it is simply promoted to default on Linux. The dns.resolve* family and all record-type queries are unaffected (they go directly to c-ares through separate entry points, as confirmed in the PR description).
Other Factors
Two bugs were reported by the automated scanner. The first (addrInfoCount off-by-one) is pre-existing and explicitly confirmed NOT to expand in blast radius on the Linux path introduced here — the Linux path uses Result.toList → Any{.list} which derives array length from list.items.len, not from addrInfoCount. The second is a minor test assertion-order nit (exitCode checked before stdout content) that has no impact on correctness. Neither warrants blocking approval. The fix matches documented Node.js behavior and the test is well-structured.
| const lines = stdout.trim().split("\n"); | ||
| const allLine = lines.find(l => l.startsWith("["))!; | ||
| const singleLine = lines.find(l => l.startsWith("single:"))!; |
There was a problem hiding this comment.
🟡 The new regression test asserts exitCode at line 73 before the stdout content assertions at lines 80–81, violating the CLAUDE.md convention that stdout assertions must come first. Move the expect(exitCode).toBe(0) call to after the expect(JSON.parse(allLine)) and expect(singleLine) assertions so that a subprocess failure surfaces the actual output rather than an opaque exit-code mismatch.
Extended reasoning...
What the bug is
CLAUDE.md (line 128) states: "When spawning processes, tests should expect(stdout).toBe(...) BEFORE expect(exitCode).toBe(0). This gives you a more useful error message on test failure." The new test in test/regression/issue/29227.test.ts violates this convention.
The specific code path
After the subprocess is awaited, the test makes the following assertions in this order:
- Line 72:
expect(realStderr).toBe('') - Line 73:
expect(exitCode).toBe(0)← exit code checked here - Lines 77–79: stdout is parsed into
allLine/singleLine - Line 80:
expect(JSON.parse(allLine)).toEqual([...]) - Line 81:
expect(singleLine).toBe('single:127.0.0.1:4')
The stdout content assertions (lines 80–81) come after the exitCode assertion (line 73).
Why existing code doesn't prevent it
There is no lint rule or type check enforcing assertion order; it is a documentation-only convention. The code compiles and runs fine, the wrong order is just a quality issue for diagnostics.
Impact
If the child process exits with code 1 (e.g., because libc getaddrinfo is unavailable, returns an error, or produces unexpected results), the test fails at line 73 with:
Expected: 0
Received: 1
This tells the developer nothing about why it failed. If the stdout assertions were first, the failure would show the actual JSON output printed by the subprocess, immediately revealing the root cause (wrong address, extra ::1 entry, DNS error code, etc.).
Concrete proof
Suppose the subprocess prints ERR:ENOTFOUND to stderr and exits with code 1:
- Current order → test fails at
expect(exitCode).toBe(0)→ error: "Expected: 0, Received: 1" — no useful info. - Correct order → test fails at
expect(JSON.parse(allLine)).toEqual([...])→ error shows the actual stdout, making the bug obvious.
How to fix
Move expect(exitCode).toBe(0) to after the stdout assertions:
expect(realStderr).toBe('');
// stdout assertions first (CLAUDE.md §128)
expect(JSON.parse(allLine)).toEqual([{ address: '127.0.0.1', family: 4 }]);
expect(singleLine).toBe('single:127.0.0.1:4');
expect(exitCode).toBe(0);| .{ "getaddrinfo", .libc }, | ||
| }); | ||
|
|
||
| // `dns.lookup` is specified to behave like getaddrinfo(3), which | ||
| // consults nsswitch.conf / /etc/hosts. On Linux the default | ||
| // backend used to be `c_ares`, which produced results that did | ||
| // not match Node for names defined only in /etc/hosts — see | ||
| // https://github.com/oven-sh/bun/issues/29227. | ||
| // | ||
| // Only `dns.lookup` routes through this default. `dns.resolve*` | ||
| // (and all record-type queries) use c-ares directly, matching | ||
| // Node's behavior. | ||
| pub const default: GetAddrInfo.Backend = switch (bun.Environment.os) { | ||
| .mac, .windows => .system, | ||
| else => .c_ares, | ||
| else => .libc, | ||
| }; | ||
|
|
||
| pub const FromJSError = JSError || error{ |
There was a problem hiding this comment.
🟣 Pre-existing inconsistency between addrInfoCount and addrInfoToJSArray in src/dns.zig: addrInfoCount always starts count=1 for the first node regardless of whether its addr is null, but addrInfoToJSArray uses fromAddrInfo(this_node) orelse continue (skipping j++) for null-addr nodes. If getaddrinfo returns a first node with addr==null, the JS array is over-allocated by one, leaving a trailing undefined slot. This is a pre-existing bug; notably, the second refutation is factually correct that this PR's Linux path uses Result.toList → Any{.list} rather than addrInfoToJSArray, so the blast radius is NOT increased on Linux — the bug remains limited to the macOS LibInfo path.
Extended reasoning...
The bug: addrInfoCount (dns.zig ~L286) initialises count = 1 unconditionally for the head node, then only increments for subsequent nodes when addr \!= null. In contrast, addrInfoToJSArray (dns.zig ~L303) iterates over all nodes starting from the head and does fromAddrInfo(this_node) orelse continue, skipping j++ whenever addr == null. These two policies disagree on the head node.
Concrete proof: Suppose getaddrinfo returns a linked list where node[0].addr==null and node[1].addr!=null. addrInfoCount returns 2 (count starts at 1, then +1 for node[1].addr!=null). addrInfoToJSArray creates a 2-element array but only writes to index 0 (for node[1], since node[0] is skipped). Index 1 is never written, leaving an uninitialised/undefined JS array slot. In JS terms: the returned array has length 2 with array[1] === undefined as a hole.
Does this PR increase the blast radius on Linux? The second refutation is correct on this point. The Linux LibC path introduced by this PR routes through: LibC.lookup → background thread → std.c.getaddrinfo → GetAddrInfo.Result.toList(allocator, addrinfo) → stored as GetAddrInfo.Result.Any{ .list = result }. In then() at dns.zig:815, const any = GetAddrInfo.Result.Any{ .list = result }. When Any.toJS handles the .list variant it creates the array with list.items.len (the actual filled count from appendAssumeCapacity), NOT from addrInfoCount. So the Linux path is safe and the blast radius claim in the synthesis description is incorrect.
Where the bug actually lives: The .addrinfo variant of Any.toJS calls addrInfoToJSArray, which is only triggered on the macOS LibInfo path (onCompleteNative(this, .{ .addrinfo = result }) at dns.zig:1078). That path predates this PR. The inconsistency in addrInfoCount for toList is also harmless since initCapacity over-allocates by at most 1, but list.items.len is always correct.
Impact: In practice, POSIX getaddrinfo conformant implementations (glibc, musl, macOS libc) do not return nodes with ai_addr==null, so this bug is purely theoretical and unlikely to manifest. The trailing undefined slot could cause subtle JS-level issues (sparse array, length mismatch) on macOS if a non-conformant getaddrinfo were used.
Fix: In addrInfoCount, add a null check for the head node: var count: u32 = @intFromBool(addrinfo.addr \!= null); to match the behaviour of addrInfoToJSArray.
| pub const default: GetAddrInfo.Backend = switch (bun.Environment.os) { | ||
| .mac, .windows => .system, | ||
| else => .c_ares, | ||
| else => .libc, | ||
| }; |
There was a problem hiding this comment.
🔴 When getaddrinfo() returns EAI_AGAIN (temporary DNS failure — DNS server unreachable, empty /etc/resolv.conf, network partition), the new Linux .libc default backend maps it through initEAI() which has no AGAIN case, silently returning Error.ENOTIMP instead of Error.ETIMEOUT. Users in Docker/Kubernetes environments will see err.code='ENOTIMP' / 'DNS resolver does not implement requested operation' for what should be a transient, retryable timeout error. Fix: add .AGAIN => Error.ETIMEOUT to the Linux-specific switch in initEAI in src/deps/c_ares.zig (matching the Windows path at c_ares.zig:1773: UV_EAI_AGAIN => Error.ETIMEOUT).
Extended reasoning...
What the bug is
When getaddrinfo() returns EAI_AGAIN (-3, "Temporary failure in name resolution"), the new Linux default .libc backend maps it to the wrong error code. Users receive err.code = 'ENOTIMP' ("DNS resolver does not implement requested operation") instead of the semantically correct ETIMEOUT. This is a meaningful mismatch: ENOTIMP signals an unsupported operation and is non-retryable in most application code, whereas ETIMEOUT correctly signals a transient, retryable failure.
The specific code path that triggers it
The call chain for the new .libc backend on Linux when getaddrinfo() fails:
LibC.run()(dns.zig ~L773): stores.err = @intFromEnum(err)whereerr = EAI_AGAIN = -3then()(dns.zig ~L831):.errbranch callsgetAddrInfoAsyncCallback(-3, null, this)getAddrInfoAsyncCallback(dns.zig:727): callshead.processGetAddrInfoNative(-3, null)processGetAddrInfoNative(dns.zig:1073): callsc_ares.Error.initEAI(-3)initEAI(c_ares.zig ~L1763):-3maps to.AGAINvia@enumFromInt; no case matches; falls toelse => bun.todo(@src(), Error.ENOTIMP)
In release builds, bun.todo() logs nothing and silently returns the value — no panic, no assert, no visible warning.
Why existing code doesn't prevent it
The initEAI function has three layers of case handling:
- NODATA/NONAME early check:
.AGAIN \!= .NODATAand\!= .NONAME— miss - Linux-specific switch:
.SOCKTYPE,.IDN_ENCODE,.ALLDONE,.INPROGRESS,.CANCELED,.NOTCANCELED— miss (else => {} fallthrough) - General switch:
0,.ADDRFAMILY,.BADFLAGS,.FAIL,.FAMILY,.MEMORY,.SERVICE,.SYSTEM— miss (else => bun.todo return ENOTIMP)
The Windows path correctly handles this via libuv.UV_EAI_AGAIN => Error.ETIMEOUT but that branch is not taken on Linux.
The blast-radius increase from this PR
Before this PR: Linux defaulted to .c_ares, which routes DNS errors through processGetAddrInfo() — a completely different code path that never calls initEAI. Only users who explicitly passed { backend: 'libc' } or { backend: 'system' } hit this bug; they were a small minority.
After this PR: .libc is the new Linux default for all dns.lookup() calls. Every Linux user who hits a temporary DNS failure (empty /etc/resolv.conf, DNS server temporarily down, Docker/Kubernetes service not yet ready, network partition) now gets ENOTIMP instead of ETIMEOUT. Ironically, one of the issues this PR claims to fix (#19086 — ioredis EAI_AGAIN for Docker service hostnames) would now fail with a different wrong error code.
Step-by-step proof
Suppose a Docker container runs dns.lookup('my-service', cb) while the Docker DNS resolver is temporarily unavailable:
std.c.getaddrinfo('my-service', ...)returnsEAI_AGAIN(-3)LibC.run()storesthis.* = .{ .err = -3 }then()callsgetAddrInfoAsyncCallback(-3, null, this)- →
processGetAddrInfoNative(-3, null)→c_ares.Error.initEAI(-3) @enumFromInt(-3)yields.AGAIN(glibcEAI_AGAIN=-3per/usr/include/netdb.h)- No switch case matches →
bun.todo(@src(), Error.ENOTIMP)→ returnsError.ENOTIMP - Callback receives:
err.code = 'DNS_ENOTIMP', message ='DNS resolver does not implement requested operation'
Application code checking err.code === 'ENOTFOUND' or err.code === 'ETIMEOUT' for retry logic will not retry, causing hard failures for what should be a transient condition.
How to fix
Add .AGAIN => Error.ETIMEOUT to the Linux-specific switch in initEAI in src/deps/c_ares.zig, matching what Windows does with UV_EAI_AGAIN => Error.ETIMEOUT.
Fixes #29227
Repro
[{ address: '127.0.0.1', family: 4 }][{ address: '::1', family: 6 }, { address: '127.0.0.1', family: 4 }][{ address: '127.0.0.1', family: 4 }]Because the default result order is
verbatim, the callback formdns.lookup(name, cb)would surface::1as the single address in Bun while Node returned127.0.0.1.Cause
dns.lookupis specified to behave likegetaddrinfo(3), which consultsnsswitch.conf//etc/hosts. On Linux Bun'sdns.lookupwas defaulting to the c-ares backend, which produces results that don't match libc for names defined only in/etc/hosts. macOS (LibInfo) and Windows (libuv) already matched Node.Fix
Switch the Linux default in
GetAddrInfo.Backend.defaultfrom.c_aresto.libc, sodns.lookuproutes throughLibC.lookup→std.c.getaddrinfo. Onlydns.lookuptakes this default;dns.resolve*(and all record-type queries) go through separate entry points and continue to use c-ares, matching Node.Users who want the previous behavior can still opt in with
dns.lookup(name, { backend: 'c_ares' })on the native API.Test
test/regression/issue/29227.test.tssnapshots/etc/hosts, appends a random127.0.0.1 <tag>entry, runsdns.lookup(tag, { all: true })anddns.lookup(tag, cb), and asserts both return only IPv4. The original/etc/hostsis always restored. Skipped on non-Linux.