feat(cli): pin explicit model+thinking, add live API smoke for harness drift by rawwerks · Pull Request #71 · openprose/prose

rawwerks · 2026-05-05T17:18:45Z

Layer 0 + Layer 1 of CLI harness robustness plan.

Stop inheriting SDK defaults: claude-sdk and codex-sdk pass explicit model + (Claude only) thinking adaptive from tools/cli/src/harnesses/defaults.ts. Override via ANTHROPIC_MODEL / OPENAI_MODEL / CODEX_MODEL.
Pin @anthropic-ai/claude-agent-sdk to exact 0.2.107 (last release with bundled cli.js; later versions ship native binaries via 8 platform optional-deps where linux-x64 and linux-x64-musl share identical {os, cpu} constraints).
New tools/cli/tests/live/smoke.live.test.ts + npm run test:live (separate vitest.live.config.ts). Auto-skips per harness when keys absent.
New .github/workflows/cli-live-smoke.yml runs the live smoke on every PR touching tools/cli/**. Uses org ANTHROPIC_API_KEY + OPENAI_API_KEY.

Why: prose run --harness claude-sdk was returning HTTP 400 (thinking.type.enabled not supported, use adaptive). Mocked tests passed because they never touched the wire format.

CI proof: run 25388993268 — both harnesses green (claude-sdk 4.05s, codex-sdk 6.71s).

Test plan: typecheck/build/test all pass (159/159); npm run test:live local green; CI live smoke green.

Follow-ups: tighten cli-real-harness-smoke.yml's prompt (too trivial to engage thinking); Layer 2 no-op detection; Layer 5 Renovate.

Generated with Claude Code.

Layer 0 + Layer 1 of the harness robustness plan: stop inheriting SDK defaults, and catch upstream drift before users hit it. * claude-sdk + codex-sdk now pass explicit model and (Claude only) thinking via tools/cli/src/harnesses/defaults.ts, the single audit point. Override per run via ANTHROPIC_MODEL / OPENAI_MODEL / CODEX_MODEL env vars. * Pin @anthropic-ai/claude-agent-sdk to exact 0.2.126. The previous ^0.2.90 resolved to 0.2.90, which hardcoded thinking.type: "enabled" — Opus 4.6+ / Sonnet 4.6+ now reject that with HTTP 400. * Add tools/cli/tests/live/smoke.live.test.ts and a `test:live` script using a dedicated vitest.live.config.ts. Tests auto-skip per harness when API keys are absent, so contributors without keys aren't blocked. * Add .github/workflows/cli-live-smoke.yml — runs the live smoke on every PR touching tools/cli/** plus workflow_dispatch. Uses org ANTHROPIC_API_KEY and OPENAI_API_KEY secrets. Complements the existing cli-real-harness-smoke.yml (binary-spawn end-to-end) with a faster harness-unit-level check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Workflow_dispatch is blocked for new workflow files until they exist on the default branch. Adding the feature branch to the push trigger so the live smoke runs without opening a PR. To revert before merge.

0.2.126 introduced platform-specific native binary packaging via 8 optional-deps, but the linux-x64 and linux-x64-musl variants share identical {os, cpu} constraints with no libc discriminator. npm install on glibc Linux runners is non-deterministic, and the SDK's runtime binary lookup can fail (CI run 25388337686 confirmed this). 0.2.107 is the last 0.2.x release with a bundled cli.js — no native binary lookup, no platform-package mess. It already supports adaptive thinking and modern model names like claude-sonnet-4-6, so it satisfies the original Layer 0 requirement. Worth ~3 weeks of SDK improvements to keep CI deterministic. Layer 1 live smoke caught this drift exactly as designed.

Workflow has now landed on this branch and was proven (run 25388993268 green for both harnesses). Revert the feature-branch entry so the trigger surface goes back to its intended scope: PRs touching tools/cli/**, push to main, and workflow_dispatch.

claude-agent-sdk@0.2.107 declares ^0.81.0 on @anthropic-ai/sdk, which resolves to versions in the GHSA-p7fg-763f-g4gf range (insecure default file permissions in Local Filesystem Memory Tool, moderate severity). Adding an npm override to bump the transitive dep to ^0.92.0 (the patched line). The override is invisible to claude-agent-sdk's own caret constraint but ensures the lockfile resolves to a non-vulnerable version. 0.92.0 was published 2026-04-30, passes the 72h cooldown. Verified: npm audit --omit=dev clean, 159/159 unit tests pass, live smoke against Anthropic still green. Layer 1's CI sibling (cli-release-check.yml's audit job) caught this.

rawwerks and others added 5 commits May 5, 2026 11:42

ci(cli-live-smoke): temporarily fire on feat branch push

86d588b

Workflow_dispatch is blocked for new workflow files until they exist on the default branch. Adding the feature branch to the push trigger so the live smoke runs without opening a PR. To revert before merge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): pin explicit model+thinking, add live API smoke for harness drift#71

feat(cli): pin explicit model+thinking, add live API smoke for harness drift#71
rawwerks wants to merge 5 commits into
mainfrom
feat/cli-robust-explicit-defaults

rawwerks commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rawwerks commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant