fix: mitigate WSL2 fetch ETIMEDOUTs to Anthropic API (#19)#23
fix: mitigate WSL2 fetch ETIMEDOUTs to Anthropic API (#19)#23nujovich wants to merge 20 commits into
Conversation
WSL2's NAT frequently resolves api.anthropic.com to an IPv6 address that isn't routable from the guest, causing fetch() to hang until ETIMEDOUT on the larger payloads of the resolve and export steps (issue #19). Calling dns.setDefaultResultOrder('ipv4first') at module load applies the fix to both the CLI and the Next.js API routes without any user-facing configuration. The override is skipped if the user already set --dns-result-order via NODE_OPTIONS. Refs #19
Even with ipv4first DNS, the first fetch under WSL2 occasionally fails with ETIMEDOUT before subsequent attempts succeed (observed on resolve). Retry up to twice with 500ms / 1500ms backoff, but only for network-level errors (ETIMEDOUT, ECONNRESET, undici socket/connect errors, etc.). HTTP responses are still surfaced as-is to avoid masking real API errors. Refs #19
To narrow down the residual ETIMEDOUTs on WSL2 (issue #19), opt-in debug logging via MINT_DEBUG=1 prints: - DNS lookup result for api.anthropic.com (addresses + family) - Payload size and maxTokens per call - Per-attempt POST start, elapsed ms on failure, and decoded error codes (including undici cause chains: ETIMEDOUT, address, port, etc.) - Retry scheduling No behavior change when MINT_DEBUG is unset. Refs #19
|
PR remains in draft until we find more stable sessions |
There was a problem hiding this comment.
Pull request overview
Mitigates WSL2-specific fetch() timeouts when calling the Anthropic API by adjusting DNS resolution defaults, adding targeted retries for transient network failures, and introducing optional debug diagnostics for network troubleshooting.
Changes:
- Force IPv4-first DNS result ordering at module load (unless
NODE_OPTIONSalready includes a--dns-result-ordersetting). - Add retry/backoff behavior to
callAnthropicfor selected transient network error codes. - Add opt-in
MINT_DEBUGlogging for DNS resolution, payload size, and per-attempt timing/error details.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // WSL2's NAT often resolves api.anthropic.com to an IPv6 address that isn't | ||
| // routable, causing fetch() to hang until ETIMEDOUT (issue #19). Forcing IPv4 | ||
| // first avoids the hang on WSL2 and is a no-op on hosts where IPv6 works. | ||
| // Honors NODE_OPTIONS=--dns-result-order=... if the user already set it. | ||
| if (!process.env.NODE_OPTIONS?.includes('--dns-result-order')) { |
| import { setDefaultResultOrder, lookup as dnsLookup } from 'node:dns' | ||
|
|
||
| // WSL2's NAT often resolves api.anthropic.com to an IPv6 address that isn't | ||
| // routable, causing fetch() to hang until ETIMEDOUT (issue #19). Forcing IPv4 | ||
| // first avoids the hang on WSL2 and is a no-op on hosts where IPv6 works. | ||
| // Honors NODE_OPTIONS=--dns-result-order=... if the user already set it. | ||
| if (!process.env.NODE_OPTIONS?.includes('--dns-result-order')) { |
| const backoffsMs = [500, 1500] | ||
| let lastErr | ||
| if (DEBUG) { | ||
| const dns = await dnsLookupAll(ANTHROPIC_HOST) | ||
| debugLog(`dns ${ANTHROPIC_HOST} ->`, JSON.stringify(dns)) | ||
| debugLog(`payload bytes=${Buffer.byteLength(body)} maxTokens=${maxTokens}`) | ||
| } | ||
| for (let attempt = 0; attempt <= backoffsMs.length; attempt++) { | ||
| const startedAt = Date.now() | ||
| try { | ||
| debugLog(`attempt ${attempt + 1}/${backoffsMs.length + 1} -> POST ${ANTHROPIC_URL}`) | ||
| const res = await fetch(ANTHROPIC_URL, { | ||
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'x-api-key': apiKey, | ||
| 'anthropic-version': '2023-06-01', | ||
| }, | ||
| body, | ||
| }) | ||
| const data = await res.json() | ||
| if (!res.ok) { | ||
| const msg = data?.error?.message || `Anthropic API error (${res.status})` | ||
| throw new Error(msg) | ||
| } | ||
| const block = (data.content || []).find((b) => b && b.type === 'text') | ||
| return block ? block.text : '' | ||
| } catch (err) { | ||
| lastErr = err | ||
| const elapsed = Date.now() - startedAt | ||
| debugLog(`attempt ${attempt + 1} failed in ${elapsed}ms: ${describeError(err)}`) | ||
| if (attempt < backoffsMs.length && isRetryableNetworkError(err)) { | ||
| debugLog(`retrying in ${backoffsMs[attempt]}ms`) | ||
| await new Promise((r) => setTimeout(r, backoffsMs[attempt])) | ||
| continue | ||
| } | ||
| throw err | ||
| } | ||
| } | ||
| const block = (data.content || []).find((b) => b && b.type === 'text') | ||
| return block ? block.text : '' | ||
| throw lastErr |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
lib/prompts.mjs:466
- The PR description mentions retrying callAnthropic with backoff on transient network errors, but callAnthropic still performs a single fetch() and the new isRetryableNetworkError/RETRYABLE_NET_CODES helpers are unused. Either implement the retry loop as described (ensuring only network failures are retried) or remove the unused retry scaffolding.
// Network errors that benefit from a retry under WSL2's flaky NAT (issue #19).
// HTTP errors (4xx/5xx) are NOT retried — they're surfaced as-is.
const RETRYABLE_NET_CODES = new Set([
'ETIMEDOUT', 'ECONNRESET', 'ECONNREFUSED', 'EAI_AGAIN',
'UND_ERR_SOCKET', 'UND_ERR_CONNECT_TIMEOUT',
])
function isRetryableNetworkError(err) {
const codes = []
if (err?.code) codes.push(err.code)
if (err?.cause?.code) codes.push(err.cause.code)
for (const inner of err?.cause?.errors || []) {
if (inner?.code) codes.push(inner.code)
}
return codes.some((c) => RETRYABLE_NET_CODES.has(c))
}
export async function callAnthropic({ apiKey, prompt, maxTokens = 3000 }) {
if (!apiKey) {
throw new Error('ANTHROPIC_API_KEY is required')
}
const res = await fetch(ANTHROPIC_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': apiKey,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: MODEL,
max_tokens: maxTokens,
messages: [{ role: 'user', content: prompt }],
}),
})
const data = await res.json()
if (!res.ok) {
const msg = data?.error?.message || `Anthropic API error (${res.status})`
throw new Error(msg)
}
| export async function callAnthropic({ apiKey, prompt, maxTokens = 3000 }) { | ||
| if (!apiKey) { | ||
| throw new Error('ANTHROPIC_API_KEY is required') |
There was a problem hiding this comment.
@copilot restore "/ v8 ignore start */" marker
There was a problem hiding this comment.
Already restored in b3af586 — both /* v8 ignore start */ (line 467) and /* v8 ignore end */ (line 504) are now in place and balanced around the network I/O block.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| // WSL2's NAT often resolves api.anthropic.com to an IPv6 address that isn't | ||
| // routable, causing fetch() to hang until ETIMEDOUT (issue #19). Forcing IPv4 | ||
| // first avoids the hang on WSL2 and is a no-op on hosts where IPv6 works. | ||
| // Honors user-provided --dns-result-order from NODE_OPTIONS or CLI args. | ||
| if (!hasDnsResultOrderOverride()) { |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Updated vitest import to include vi and afterEach for test setup/teardown - Added callAnthropic to the prompts.mjs imports - All 35 existing tests pass This prepares for Task 3: Adding callAnthropic tests
| export function hasDnsResultOrderOverride() { | ||
| if (process.env.NODE_OPTIONS?.includes('--dns-result-order')) return true | ||
|
|
||
| for (let i = 0; i < process.execArgv.length; i++) { | ||
| const arg = process.execArgv[i] | ||
| if (arg === '--dns-result-order') return true | ||
| if (arg?.startsWith('--dns-result-order=')) return true | ||
| } | ||
|
|
||
| return false | ||
| } | ||
|
|
||
| // Returns true when running inside WSL2 (Windows Subsystem for Linux v2). | ||
| // WSL2 always sets WSL_DISTRO_NAME; WSL_INTEROP is a fallback for edge cases. | ||
| // /proc/version contains "microsoft" on WSL2 kernels as a belt-and-suspenders | ||
| // check if neither env var is present. | ||
| export function isWSL2() { | ||
| if (process.env.WSL_DISTRO_NAME || process.env.WSL_INTEROP) return true | ||
| try { | ||
| return /microsoft/i.test(readFileSync('/proc/version', 'utf8')) | ||
| } catch { | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
@copilot move hasDnsResultOrderOverride and isWSL2 into a separated internal module that tests can import
There was a problem hiding this comment.
Done in c49aa24. Both helpers are now in lib/net-utils.mjs (not in files / public API). lib/prompts.mjs imports them without re-exporting. Tests moved to lib/__tests__/net-utils.test.mjs with cases updated to match the current kernel-file-based isWSL2 implementation — all 47 tests pass with 100% function coverage.
| export function isWSL2() { | ||
| if (process.env.WSL_DISTRO_NAME || process.env.WSL_INTEROP) return true | ||
| try { | ||
| return /microsoft/i.test(readFileSync('/proc/version', 'utf8')) | ||
| } catch { | ||
| return false | ||
| } |
| /* v8 ignore start */ | ||
| if (DEBUG) { | ||
| const dns = await dnsLookupAll(ANTHROPIC_HOST) | ||
| debugLog(`dns ${ANTHROPIC_HOST} ->`, JSON.stringify(dns)) | ||
| debugLog(`payload bytes=${Buffer.byteLength(body)} maxTokens=${maxTokens}`) | ||
| } | ||
| const block = (data.content || []).find((b) => b && b.type === 'text') | ||
| return block ? block.text : '' | ||
| for (let attempt = 0; attempt <= backoffsMs.length; attempt++) { | ||
| const startedAt = Date.now() | ||
| try { | ||
| debugLog(`attempt ${attempt + 1}/${backoffsMs.length + 1} -> POST ${ANTHROPIC_URL}`) | ||
| const res = await fetch(ANTHROPIC_URL, { | ||
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'x-api-key': apiKey, | ||
| 'anthropic-version': '2023-06-01', | ||
| }, | ||
| body, | ||
| }) | ||
| const data = await res.json() | ||
| if (!res.ok) { | ||
| const msg = data?.error?.message || `Anthropic API error (${res.status})` | ||
| throw new Error(msg) | ||
| } | ||
| const block = (data.content || []).find((b) => b && b.type === 'text') | ||
| return block ? block.text : '' | ||
| } catch (err) { | ||
| const elapsed = Date.now() - startedAt | ||
| if (DEBUG) { | ||
| debugLog(`attempt ${attempt + 1} failed in ${elapsed}ms: ${describeError(err)}`) | ||
| } | ||
| if (attempt < backoffsMs.length && isRetryableNetworkError(err)) { | ||
| debugLog(`retrying in ${backoffsMs[attempt]}ms`) | ||
| await new Promise((r) => setTimeout(r, backoffsMs[attempt])) | ||
| continue | ||
| } | ||
| throw err | ||
| } | ||
| } | ||
| /* v8 ignore end */ |
There was a problem hiding this comment.
Done in 5b3fe10. The broad /* v8 ignore start/end */ blocks around both isRetryableNetworkError and the retry loop body are removed. Now only the two truly untestable if (DEBUG) branches are marked with targeted /* v8 ignore next 5 */ and /* v8 ignore next 3 */. The retry loop, error checking, isRetryableNetworkError, and setTimeout delay are all now covered by the new tests.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Summary
callAnthropicup to twice (500ms / 1500ms backoff) on transient network errors only (ETIMEDOUT, ECONNRESET, undici socket/connect errors); HTTP responses are still surfaced as-is.MINT_DEBUG=1diagnostics: DNS lookup result, payload size, per-attempt timing, and decoded undici error chains, to narrow down any residual failures.Refs #19
Type of change
How to test
ANTHROPIC_API_KEY, runnpm run devand exerciseaudit→resolve→exportend-to-end. Theresolvestep (larger payload) should no longer hang withETIMEDOUT.MINT_DEBUG=1 npm run devand confirm[mint:debug]lines show the DNS lookup returningfamily:4first and a successful first attempt in normal latency (~10–20s for resolve/export).Checklist
npx tsc --noEmitpasses with no errorsGenerated by Claude Code