Skip to content

feat(cli): pin explicit model+thinking, add live API smoke for harness drift#71

Open
rawwerks wants to merge 5 commits into
mainfrom
feat/cli-robust-explicit-defaults
Open

feat(cli): pin explicit model+thinking, add live API smoke for harness drift#71
rawwerks wants to merge 5 commits into
mainfrom
feat/cli-robust-explicit-defaults

Conversation

@rawwerks
Copy link
Copy Markdown
Contributor

@rawwerks rawwerks commented May 5, 2026

Layer 0 + Layer 1 of CLI harness robustness plan.

  • Stop inheriting SDK defaults: claude-sdk and codex-sdk pass explicit model + (Claude only) thinking adaptive from tools/cli/src/harnesses/defaults.ts. Override via ANTHROPIC_MODEL / OPENAI_MODEL / CODEX_MODEL.
  • Pin @anthropic-ai/claude-agent-sdk to exact 0.2.107 (last release with bundled cli.js; later versions ship native binaries via 8 platform optional-deps where linux-x64 and linux-x64-musl share identical {os, cpu} constraints).
  • New tools/cli/tests/live/smoke.live.test.ts + npm run test:live (separate vitest.live.config.ts). Auto-skips per harness when keys absent.
  • New .github/workflows/cli-live-smoke.yml runs the live smoke on every PR touching tools/cli/**. Uses org ANTHROPIC_API_KEY + OPENAI_API_KEY.

Why: prose run --harness claude-sdk was returning HTTP 400 (thinking.type.enabled not supported, use adaptive). Mocked tests passed because they never touched the wire format.

CI proof: run 25388993268 — both harnesses green (claude-sdk 4.05s, codex-sdk 6.71s).

Test plan: typecheck/build/test all pass (159/159); npm run test:live local green; CI live smoke green.

Follow-ups: tighten cli-real-harness-smoke.yml's prompt (too trivial to engage thinking); Layer 2 no-op detection; Layer 5 Renovate.

Generated with Claude Code.

rawwerks and others added 5 commits May 5, 2026 11:42
Layer 0 + Layer 1 of the harness robustness plan: stop inheriting SDK
defaults, and catch upstream drift before users hit it.

* claude-sdk + codex-sdk now pass explicit model and (Claude only)
  thinking via tools/cli/src/harnesses/defaults.ts, the single audit
  point. Override per run via ANTHROPIC_MODEL / OPENAI_MODEL /
  CODEX_MODEL env vars.
* Pin @anthropic-ai/claude-agent-sdk to exact 0.2.126. The previous
  ^0.2.90 resolved to 0.2.90, which hardcoded thinking.type: "enabled"
  — Opus 4.6+ / Sonnet 4.6+ now reject that with HTTP 400.
* Add tools/cli/tests/live/smoke.live.test.ts and a `test:live` script
  using a dedicated vitest.live.config.ts. Tests auto-skip per harness
  when API keys are absent, so contributors without keys aren't blocked.
* Add .github/workflows/cli-live-smoke.yml — runs the live smoke on
  every PR touching tools/cli/** plus workflow_dispatch. Uses org
  ANTHROPIC_API_KEY and OPENAI_API_KEY secrets. Complements the
  existing cli-real-harness-smoke.yml (binary-spawn end-to-end) with
  a faster harness-unit-level check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Workflow_dispatch is blocked for new workflow files until they exist on
the default branch. Adding the feature branch to the push trigger so the
live smoke runs without opening a PR. To revert before merge.
0.2.126 introduced platform-specific native binary packaging via 8
optional-deps, but the linux-x64 and linux-x64-musl variants share
identical {os, cpu} constraints with no libc discriminator. npm
install on glibc Linux runners is non-deterministic, and the SDK's
runtime binary lookup can fail (CI run 25388337686 confirmed this).

0.2.107 is the last 0.2.x release with a bundled cli.js — no native
binary lookup, no platform-package mess. It already supports adaptive
thinking and modern model names like claude-sonnet-4-6, so it satisfies
the original Layer 0 requirement. Worth ~3 weeks of SDK improvements
to keep CI deterministic.

Layer 1 live smoke caught this drift exactly as designed.
Workflow has now landed on this branch and was proven (run 25388993268
green for both harnesses). Revert the feature-branch entry so the
trigger surface goes back to its intended scope: PRs touching
tools/cli/**, push to main, and workflow_dispatch.
claude-agent-sdk@0.2.107 declares ^0.81.0 on @anthropic-ai/sdk, which
resolves to versions in the GHSA-p7fg-763f-g4gf range (insecure default
file permissions in Local Filesystem Memory Tool, moderate severity).

Adding an npm override to bump the transitive dep to ^0.92.0 (the
patched line). The override is invisible to claude-agent-sdk's own
caret constraint but ensures the lockfile resolves to a non-vulnerable
version. 0.92.0 was published 2026-04-30, passes the 72h cooldown.

Verified: npm audit --omit=dev clean, 159/159 unit tests pass, live
smoke against Anthropic still green.

Layer 1's CI sibling (cli-release-check.yml's audit job) caught this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant