Skip to content

Sync#2

Open
metehanozdev wants to merge 956 commits into
emregucerr:mainfrom
browserbase:main
Open

Sync#2
metehanozdev wants to merge 956 commits into
emregucerr:mainfrom
browserbase:main

Conversation

@metehanozdev

Copy link
Copy Markdown
Collaborator

why

what changed

test plan

pirate and others added 29 commits March 4, 2026 16:55
PSA potential hackers: dont get excited, we don't have any real secrets
in CI worth stealing, and our CI does not autodeploy anything to prod.
All important secrets and CD processes are kept in our closed-source
repos.

# why

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Add a gating workflow that blocks CI until a maintainer approves running
secrets on forked PRs. CI now triggers from that gate, resolves labels
and path filters under workflow_run, removes same-repo guards so
integration/e2e/evals run on approved forks, and checks out the PR
commit consistently across jobs.

<sup>Written for commit c682847.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1782">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…ed" (#1786)

Reverts #1782

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Reverts the approval-based CI for external contributors. CI now runs on
pull_request and blocks secrets for forked PRs by skipping integration,
E2E, and eval jobs.

- **Refactors**
  - Removed the “Ensure Contributor Is Trusted to Run CI” workflow.
  - Switched CI trigger to pull_request; removed workflow_run logic.
  - Read labels from github.event.pull_request; removed API calls.
  - Simplified checkouts; dropped explicit head_sha refs.
  - Updated concurrency group to use github.ref.
  - Ignored docs-only changes in CI.

<sup>Written for commit d6ace82.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1786">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
Reverts #1780

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Reverts the change that skipped CI on forked PRs. Integration tests,
evals, and the Stainless preview now run for all PRs by removing the
head-repo equality checks in ci.yml and stainless.yml.

<sup>Written for commit 18480e8.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1787">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

cdpHeaders is already plumbed through packages/server correctly, it was
just missing from the spec.

- packages/core/lib/v3/types/public/api.ts:15 defines cdpHeaders on
LocalBrowserLaunchOptionsSchema.
- packages/server/src/routes/v1/sessions/start.ts:192 forwards
browser.launchOptions with a spread into localBrowserLaunchOptions, so
cdpHeaders is preserved.
- packages/server/src/lib/InMemorySessionStore.ts:240 passes
localBrowserLaunchOptions straight into new V3(...).
- packages/core/lib/v3/v3.ts:750 passes lbo.cdpHeaders into
V3Context.create(...).
- packages/core/lib/v3/understudy/context.ts:167 finally uses it in
CdpConnection.connect(wsUrl, { headers: opts?.cdpHeaders }).

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added the missing `cdpHeaders` field to the v3 server OpenAPI spec so
clients can send custom Chrome DevTools Protocol headers. This aligns
the spec with server launch options and prevents client
codegen/validation errors.

<sup>Written for commit 39ee737.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1797">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…and server-v4 dirs (#1796)

# Follow-up Tasks

- [ ] Update stainless SDK custom code for all languages to pull new
`stagehand-server-v3-darwin-x64` binary names (`-v3-` added)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Split the Stagehand API into `packages/server-v3` and
`packages/server-v4`, each with its own builds, tests, SEA binaries, and
release workflows. Delivers STG-1536 and lets us keep v3 stable while
iterating on v4; CI/test discovery and OpenAPI artifacts are versioned.

- **Refactors**
- Renamed the original server to `packages/server-v3`
(`@browserbasehq/stagehand-server-v3`); updated docs and runtime path
helpers (now synced across core/docs/evals and both servers), ESLint
globs/ignores, scripts/Turbo filters, tests, and Stainless to read
`packages/server-v3/openapi.v3.yaml`; v3 SEA binaries use
`stagehand-server-v3-*`.
- Added `packages/server-v4` (`@browserbasehq/stagehand-server-v4`) with
`/v4/**` routes, SSE streaming via `x-stream-response`, LRU/TTL
in-memory session store, health/readiness, logging/metrics,
`openapi.v4.yaml` + generator, SEA tooling, and v4 integration tests.
- CI: path filters, test discovery, and artifacts cover both versions;
added `stagehand-server-v4-release.yml` and
`stagehand-server-v4-sea-build.yml`; renamed v3 workflows; artifacts
include `packages/server-v3/**` and `packages/server-v4/**` dists and
OAS.

- **Migration**
- Replace `packages/server/**` refs with `packages/server-v3/**` or
`packages/server-v4/**`.
- Use new package filters and binary names:
`@browserbasehq/stagehand-server-v3` /
`@browserbasehq/stagehand-server-v4`; `stagehand-server-v3-*` /
`stagehand-server-v4-*`.
- Update OpenAPI consumers to `packages/server-v3/openapi.v3.yaml` or
`packages/server-v4/openapi.v4.yaml`.

<sup>Written for commit 2b9114c.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1796">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- Adds the `@browserbasehq/browse-cli` package (`packages/cli`) to the
stagehand monorepo, open-sourcing browser automation for AI agents
- CLI provides stateful browser control via a daemon architecture —
navigation, clicking, typing, screenshots, accessibility snapshots,
multi-tab, network capture, and env switching (local/remote)
- Uses `@browserbasehq/stagehand` as a workspace dependency (bundled
into the CLI binary via tsup)
- Includes full test suite and documentation

## Changes
- `packages/cli/` — all CLI source code, config, tests, and docs
- `pnpm-workspace.yaml` — added `packages/cli` to workspace
- `.github/workflows/ci.yml` — added CLI path filters and build artifact
uploads
- `.changeset/open-source-browse-cli.md` — changeset for initial release
- `pnpm-lock.yaml` — updated lockfile

## Test plan
- [x] CLI builds successfully (`pnpm --filter @browserbasehq/browse-cli
run build`)
- [x] Full monorepo build passes (`turbo run build` — 9/9 tasks)
- [x] `browse --help` and `browse --version` output correctly
- [x] `browse status` returns valid JSON
- [x] Lint passes clean (`pnpm --filter @browserbasehq/browse-cli run
lint`)
- [x] Source verified identical to stagent-cli (only import path
changed)
- [x] Empirically tested Browserbase credential requirements match core
- [ ] Run `pnpm --filter @browserbasehq/browse-cli run test` (requires
Chrome/browser environment)

## Known issues (pre-existing from stagent-cli, not introduced by this
PR)
- Network capture `response.json` always writes `status: 0` — response
metadata from `responseReceived` CDP event is not persisted to
`loadingFinished` handler
- Ref-based `click` command silently ignores
`--button`/`--count`/`--force` flags (coordinate-based `click_xy`
handles them correctly)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g CI (#1801)

# why

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Corrects the changeset package reference from
`@browserbasehq/stagehand-server` to
`@browserbasehq/stagehand-server-v3` to unblock CI and ensure the
correct package receives the patch release.

<sup>Written for commit 177bc48.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1801">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- `browse env` showed stale "local" mode after `browse env remote`
- Root cause: `.mode` file was only written during lazy browser init
(`ensureBrowserInitialized`), not at daemon startup. Between daemon
start and first command, `readCurrentMode()` returned `null` and fell
back to hardcoded `"local"`
- Write `.mode` eagerly in `runDaemon()` at startup so it's immediately
available
- Fall back to `desiredMode` instead of `"local"` in the `env` display
handler as a safety net

## Test plan
- [x] Reproduced bug: `browse env remote` → `browse env` showed
`"mode":"local"`
- [x] Verified fix: `browse env remote` → `browse env` now shows
`"mode":"remote"`
- [x] `mode.test.ts` passes (3/3)


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes `browse env` showing stale "local" after `browse env remote`
(STG-1547). The daemon now writes `.mode` at startup, the display falls
back to `desiredMode` until mode is written, and a patch changeset is
added for `@browserbasehq/browse-cli`.

<sup>Written for commit 9661d92.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1806">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Stacked on #1800
- Only `BROWSERBASE_API_KEY` is required for remote mode in the CLI
- `BROWSERBASE_PROJECT_ID` is still passed through if set, but no longer
checked

## Changes
- `packages/cli/src/index.ts` — `hasBrowserbaseCredentials()` only
checks for API key
- `packages/cli/tests/mode.test.ts` — Updated test to match new error
message
- `packages/cli/README.md` — Updated docs to reflect optional project ID

## Test plan
- [x] Existing mode test updated
- [x] Manual: `browse env remote` with only `BROWSERBASE_API_KEY` set

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Make `BROWSERBASE_PROJECT_ID` optional in the CLI for remote mode, so
only `BROWSERBASE_API_KEY` is required. The project ID is still
forwarded when provided.

- **Bug Fixes**
- Updated remote mode check and error message to only require
`BROWSERBASE_API_KEY`.
- Autodetection now defaults to `remote` when the API key is set;
otherwise `local`.
  - Updated tests and `@browserbasehq/browse-cli` README to match.

<sup>Written for commit 99eb186.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1803">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r PRs to run CI with secrets (#1794)

# why

- External contributor PRs currently fail CI because they cant run with
secrets
- We dont want to allow them to run with secrets until a team member
"claims" them and reviews for any secrets exfiltration / sketchy code
- Once claimed, we want to run the full CI suite with secrets

# what changed

# test plan

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds two GitHub Actions that let maintainers claim external contributor
PRs by mirroring the approved head SHA to a maintainer-owned branch so
full CI can run with secrets. Claims come from an approving review by a
team member with write access on the latest commit and are
auto-invalidated on new commits (Linear STG-1518).

- **New Features**
- Detects forked PRs and posts claim instructions; manages labels:
`external-contributor`, `external-contributor:awaiting-approval`,
`external-contributor:mirrored`, `external-contributor:stale`,
`external-contributor:completed`.
- On approving review of the latest commit, verifies reviewer
permission, mirrors that exact SHA to
`external-contributor-pr-<PR#>-<12sha>`, and creates/reopens a “[Claimed
#X]” PR assigned to the approver.
- Closes and links the original PR with marker comments; keeps
labels/status in sync on both PRs.
- Auto-closes the mirror when new commits land on the external PR and
comments with next steps; if the mirror closes without merge, reopens
and relabels the original PR; if the external PR is reopened with the
same approved SHA while the mirror is open, it is closed again to keep
discussion on the mirror.
- Implemented via `external-contributor-pr-approval-handoff.yml`
(captures approved reviews, uploads artifact) and
`external-contributor-pr.yml` (consumes artifact, performs mirroring);
uses `actions/github-script@v7`, `actions/create-github-app-token@v1`,
`actions/checkout@v4`, `actions/download-artifact@v4`,
`actions/upload-artifact@v4`; concurrency scoped per PR/workflow run.

- **Migration**
- Create a GitHub App with `contents:write`, `pull_requests:write`, and
`issues:write`; add `EXTERNAL_CONTRIBUTOR_PR_APP_ID` and
`EXTERNAL_CONTRIBUTOR_PR_APP_PRIVATE_KEY` secrets.
- To claim: submit an approving review on the latest commit of a forked
PR. If new commits are pushed, approve again to re-claim and rerun CI.

<sup>Written for commit 4875e99.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1794">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

bug in previous approach

# what changed

# test plan

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the external PR approval flow by switching to the correct
`GITHUB_TOKEN`, stabilizing the mirror/refresh behavior, and ignoring
third‑party bot comments when parsing claim markers. Also improves the
`claude` workflow to build the repo before edits and allow rerunning
failed jobs.

- **Bug Fixes**
- Use `GITHUB_TOKEN` for branch pushes and API calls; remove the GitHub
App token path.
  - Enable `persist-credentials: true` during checkout to allow pushes.
- Keep the mirrored PR open and mark it stale when new commits land on
the external PR; relabel both PRs consistently.
- Auto-handle reopen/close transitions across external and mirrored PRs.
- Ignore comments from non-managed bots (e.g., Greptile, Cubic); only
parse claim markers from `github-actions[bot]` to avoid false triggers.

- **Refactors**
- Inline a small JS lib (`ECPR_LIB`) to manage labels, comments,
lifecycle, and claims; jobs run in clear phases (external lifecycle →
claim prep → branch refresh → claim finalize).
- Refresh internal branches by rebasing onto the approved external ref;
report conflicts cleanly for manual follow-up.
- Improve `claude.yml`: upgrade to `actions/checkout@v6`, set `actions:
write`, run `pnpm`/`turbo` build via `setup-node-pnpm-turbo`, enable
`track_progress`, and use an explicit tool allowlist for
`anthropics/claude-code-action@v1`.

<sup>Written for commit a46b159.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1812">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# Why

OpenAI organizations with Zero Data Retention (ZDR) rejects stored
responses from the Responses API (`store: true` is the default when the
AI SDK auto selects it). This causes agent runs to fail

# What Changed

- Set `openai: { store: false }` in `providerOptions` across
`generateText` / `streamText` calls: `v3AgentHandler.ts` (execute +
stream), `handleDoneToolCall.ts`,
- Simplified the existing Gemini `providerOptions` — removed the
conditional `modelId.includes("gemini-3")` check and always pass
`google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` since non-Google
providers ignore it.

# Test Plan

- [ ] Run agent in mode with an OpenAI model to confirm no breaking
changes


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Defaulted agent calls to OpenAI to not store responses, preventing
failures for Zero Data Retention orgs. Also simplified Gemini options by
always sending high media resolution.

- **Bug Fixes**
- Set `providerOptions.openai.store` to `false` for agent `generateText`
and `streamText` calls in `v3AgentHandler` (execute + stream) and
`handleDoneToolCall`, avoiding Responses API rejections in ZDR orgs.

- **Refactors**
- Always pass `google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` in
`providerOptions`; non-Google providers ignore it. Added a changeset for
a patch release of `@browserbasehq/stagehand`.

<sup>Written for commit a01d8c0.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1814">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- Adds `--context-id <id>` and `--persist` flags to `browse open` so
agents can load/persist browser state (cookies, localStorage, etc.)
across Browserbase sessions using Contexts
- Validates edge cases: `--persist` requires `--context-id`,
`--context-id` requires remote mode, context change triggers daemon
restart

## Usage
```bash
# Load a context (read-only — state not saved back)
browse open https://app.com --context-id ctx_abc123

# Load and persist changes back on session end
browse open https://app.com --context-id ctx_abc123 --persist
```

## How it works
1. `browse open --context-id` writes context config to
`/tmp/browse-{session}.context`
2. The daemon reads this file during browser initialization and passes
it through as `browserbaseSessionCreateParams.browserSettings.context`
3. If a second `browse open` is called with a different context ID, the
daemon is restarted (context is baked into the BB session at creation
time)

Context config uses a temp file (same pattern as `.mode`) because it's
needed at Browserbase session creation time, before the daemon's command
socket is up.

## Test plan
- [x] `browse open https://example.com --context-id <known-id>
--persist` on remote mode — verify session created with context in BB
dashboard
- [x] `browse stop` then reopen with same context — verify state
persists
- [x] Verify context mismatch triggers daemon restart (open with context
A, then open with context B)
- [x] Same context, second open — verify no unnecessary restart
- [x] `browse open https://example.com --context-id <id>` on local mode
— verify clear error
- [x] `browse open https://example.com --persist` without `--context-id`
— verify clear error
- [x] Plain `browse open` (no context flags) — verify no regression
- [x] `cleanupStaleFiles` removes `.context` file on shutdown
- [x] Stale `.context` file from crashed daemon is cleared on next
`browse open` without `--context-id`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# why

when running pnpm format, it formats files that are not relevant to
current changes which is annoying

# what changed

formatted the unformatted files in cli package 

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Standardized Prettier/ESLint formatting in `packages/cli` so `pnpm
format` runs are stable and don’t touch unrelated files. No functional
changes.

- **Refactors**
- Applied Prettier across `packages/cli/src` and tests (line breaks,
parens, quotes).
- Tidied lint/Prettier config formatting (`eslint.config.mjs`,
`.prettierrc` newline).
  - Adjusted test imports and one assertion to match formatter.

<sup>Written for commit 31570db.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1819">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

Allow users to pass custom headers in their LLM calls

# what changed

Add headers to the model.ts types 

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds `headers` support to `ClientOptions` so clients can send custom
HTTP headers with every provider request. Useful for auth tokens or
routing hints without changing global config.

- **New Features**
- Added `headers?: Record<string, string>` to `ClientOptions` in
`packages/core/lib/v3/types/public/model.ts`; headers are sent with each
request.
  - No breaking changes; default behavior is unchanged.

<sup>Written for commit 424dc1a.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1817">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why
Sync the Stagehand MCP docs with the Browserbase MCP docs for STG-1576.

# what changed
Copied the refreshed Browserbase MCP introduction and setup pages into
`packages/docs/v3/integrations/mcp`.

# test plan
`pnpm exec prettier --check packages/docs/docs.json
packages/docs/v3/integrations/mcp/introduction.mdx
packages/docs/v3/integrations/mcp/setup.mdx`; `pnpm --dir packages/docs
exec mint broken-links` (unrelated existing failures only); `pnpm lint`
fails in `packages/core` on an existing ESLint rule config issue.

---------

Co-authored-by: ci-test <ci-test@example.com>
…bservability (#1824)

Refactored flow logging to an event-based system with `FlowLogger` and a
pluggable `eventStore`, improving LLM/CDP traces, action and screenshot
events, and concise prompt/response summaries. `V3` now carries a
`sessionId` and `flowLoggerContext`. Split the server into `server-v3`
and `server-v4` with separate OpenAPI, routes, and SEA builds, and
updated CI.

- **New Features**
- Added `eventStore` with `FileEventStore`, queries, aggregate metrics,
and bus attachment; exported `getEventStore`, `setEventStore`,
`destroyEventStore`, and `getFlowLogConfigDir`.
- Replaced `SessionFileLogger` with `FlowLogger` across agents and
handlers; added `wrapWithLogging` for `Page` actions; standardized event
names via `toTitleCase`.
- Switched to concise LLM summary helpers (`extractLlmPromptSummary`,
`extractLlmCuaPromptSummary`, `extractLlmCuaResponseSummary`) and
`FlowLogger.createLlmLoggingMiddleware`.
- `V3Options.sessionId` now used to associate flows; CDP calls are
linked to flow events for better correlation.
- Exported `FlowLogger`, `FlowEvent`, and `toTitleCase` from
`@core/v3/index`.

- **Migration**
- Replace `SessionFileLogger` and `@logAction` with `FlowLogger` methods
(`log*`, `wrapWithLogging`).
- Use `setEventStore/getEventStore` to plug custom storage
(`FileEventStore` by default); optionally pass `sessionId` in
`V3Options`.
- Update paths/scripts from `packages/server` to `packages/server-v3`;
use new binaries `stagehand-server-v3-*`/`stagehand-server-v4-*`.

<sup>Written for commit c35fdbd.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1824">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
# Why

Add Browserbase Search API as the primary search tool for agents (`POST
/v1/search`), leveraging the Browserbase API key users already have
configured. Brave Search remains fully supported as a backwards
compatible fallback for existing users with `BRAVE_API_KEY` set.

# What Changed

- **Browserbase Search tool**: New search tool powered by the
Browserbase Search API, enabled via `useSearch: true` in
`agent.execute()`.
- **Brave Search (backwards-compatible)**: Existing users with
`BRAVE_API_KEY` in their environment continue to get the search tool
automatically with no code changes required.
- **Priority**: `useSearch: true` with a Browserbase API key takes
precedence. If not set, falls back to Brave if `BRAVE_API_KEY` is
present.
- **Validation**: If `useSearch: true` is set without a valid
Browserbase API key, a clear error is thrown at preparation time (before
the agent loop starts).
- **Separate files**: `browserbaseSearch.ts` and `braveSearch.ts` keep
the implementations cleanly isolated.


```typescript
const agent = stagehand.agent({ mode: "hybrid" });

const result = await agent.execute({
  instruction: "use the search tool to find me the capital of france",
  useSearch: true,
});
```

# Test Plan

- [x] Run agent with `useSearch: true` and a valid `BROWSERBASE_API_KEY`
— Browserbase search tool should return results
- [x] Run agent with `useSearch: true` and no Browserbase API key —
should throw `MissingEnvironmentVariableError`
…re for Observability" (#1837)

Reverts #1824

See #1836 for context

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Reverts the new flow-logging schema and event store, restoring the
previous session file–based logger. Updates agents, handlers, and evals
to use `SessionFileLogger` and removes the `sessionId` option from
`V3Options`. This undoes the STG-1566 observability changes to stabilize
logging.

- **Refactors**
  - Remove `packages/core/lib/v3/eventStore.ts` and related wiring.
- Restore `SessionFileLogger` in `flowLogger.ts` with helpers like
`formatCuaPromptPreview`, `formatLlmPromptPreview`, `logAction`, and
`logStagehandStep`.
- Replace `FlowLogger` usage with `SessionFileLogger` across
agents/handlers; update LLM clients (`OpenAICUAClient`,
`AnthropicCUAClient`, `GoogleCUAClient`, `AISdkClient`) to log
requests/responses with operation names.
- Simplify Understudy/CDP instrumentation; remove FlowLogger context in
`cdp.ts`; use `logAction` in `page.ts`.
- Remove `toTitleCase` and log explicit action names (e.g.,
`Understudy.<method>`, `Page.close`).
- Evals: switch to event-driven screenshot collection via the V3 event
bus; drop interval polling.
  - Remove `sessionId` handling and related code in `v3.ts`.

- **Migration**
  - Remove `sessionId` from `V3Options` if you were passing it.
- Any direct uses of the reverted `eventStore` or the new observability
schema are no longer available.

<sup>Written for commit f4b687f.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1837">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why
- there is a vulnerability in `"langchain/core"` for versions lower than
`0.3.80` (more info
[here](https://www.cve.org/CVERecord?id=CVE-2025-68665))
- we have this package as an optional dep 
# what changed
- bumps the optional `"langchain/core"` dep to `^0.3.80`


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Bumps optional `@langchain/core` to ^0.3.80 to resolve a vulnerability
in earlier versions. Lockfile refreshed; no runtime changes.

<sup>Written for commit 5e5d812.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1841">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

This is the real, recovered FlowLogger PR + improvements based on
yesterday's review session.

I accidentally force-pushed an earlier, incomplete branch state onto the
original flow logger PR and dropped several days of work. This PR
replaces that broken state with the actual recovered branch and the
follow-up fixes needed to make the design correct again.

<img width="1569" height="1015" alt="image"
src="https://github.com/user-attachments/assets/f99171fe-27df-488c-a87a-75605c2a3c1b"
/>


## what this PR does

This restores the intended v3 FlowLogger/EventStore design:

- every `V3` instance owns exactly one FlowLogger session
- every `V3` instance owns exactly one `EventStore`
- FlowLogger produces structured `FlowEvent`s on the instance bus
- EventStore consumes those events and fans them out to sinks
- the default query history is shallow and bounded so logging does not
grow memory without limit

The goal is a stable, queryable execution tree for
Stagehand/agent/CDP/LLM work that preserves parent/child relationships,
survives ALS re-entry, and does not let sink failures break main
execution.

## architecture

### FlowLogger

`packages/core/lib/v3/flowlogger/FlowLogger.ts`

FlowLogger owns:

- the `FlowEvent` model
- the `FlowLoggerContext` stored in AsyncLocalStorage
- method/closure wrappers: `wrapWithLogging(...)`, `runWithLogging(...)`
- re-entry helpers: `withContext(...)`, `resolveContext(...)`
- CDP event helpers
- LLM request/response helpers and middleware

FlowLogger is the event producer. It is responsible for building the
execution tree and maintaining the active parent stack. It is not
responsible for persistence or output destinations.

### EventStore

`packages/core/lib/v3/flowlogger/EventStore.ts`

EventStore owns:

- per-session sink registration
- query routing
- file-backed session directory setup
- bounded in-memory ancestry retention
- store lifecycle and teardown

Sink implementations live in
`packages/core/lib/v3/flowlogger/EventSink.ts`, and
prettifying/sanitization lives in
`packages/core/lib/v3/flowlogger/prettify.ts`.

EventStore is the consumer/router layer. It is intentionally
per-instance and single-session.

### V3 wiring

`packages/core/lib/v3/v3.ts`

In the `V3` constructor:

- `this.eventStore = new EventStore(this.sessionId, opts)`
- `this.flowLoggerContext = FlowLogger.init(this.sessionId, this.bus)`
- `this.bus.on("*", this.eventStore.emit)`

So FlowLogger always has a per-instance context, and the bus is the
handoff point between FlowLogger and EventStore.

## lifecycle

### FlowLogger lifecycle

1. `V3` creates the per-instance `FlowLoggerContext`
2. instrumented methods enter through `wrapWithLogging(...)` or
`runWithLogging(...)`
3. a started event is emitted
4. that event is pushed onto `parentEvents`
5. nested work inherits that parent stack through ALS
6. completed/error events are emitted on unwind
7. `V3.close()` clears the in-memory flow context

### EventStore lifecycle

1. `V3` creates one `EventStore(sessionId, opts)`
2. the default shallow in-memory query sink is attached immediately
3. optional sinks are attached based on runtime config
4. `V3` forwards wildcard bus events into the store
5. `destroy()` tears down the configured sinks

## config / runtime options

### always-on pieces

Every `V3` instance always gets:

- FlowLogger context
- shared event bus
- one EventStore
- one default shallow in-memory query sink

So instrumentation exists even if no human-visible sink is active.

### stderr sink

Enabled automatically when:

- `verbose: 2`
- `BROWSERBASE_FLOW_LOGS=1`

Behavior:

- writes prettified logs to stderr
- suppresses CDP events to keep interactive output high-signal
- does not retain history
- is best-effort only

### file sinks

Enabled automatically when:

- `BROWSERBASE_CONFIG_DIR` is set

Behavior:

- `JsonlFileEventSink` writes full events to `session_events.jsonl`
- `PrettyLogFileEventSink` writes prettified lines to
`session_events.log`
- `session.json` is written with sanitized options
- `sessions/latest` is maintained best-effort
- file sinks are write-only; `query()` returns `[]`

### queryable in-memory history

Default behavior:

- EventStore attaches a `ShallowInMemoryEventSink`
- retention limit is `500` events per session
- retained events keep ancestry metadata only
- retained `data` is intentionally stripped in the default sink

That shallow history is what pretty formatting uses to recover recent
ancestry without retaining screenshots/base64 payloads forever.

## memory impact

The main memory constraint is the default query sink.

What it retains:

- up to 500 recent shallow events per `V3` instance
- ids, timestamps, event types, parent ids, session id

What it does not retain by default:

- full screenshots
- large base64 blobs
- full historical payloads

Implications:

- memory scales roughly with `number_of_sessions * 500 shallow events`
- stderr sink does not retain history
- file sinks do not materially retain history in process memory beyond
normal stream buffering
- attaching `InMemoryEventSink` explicitly is an opt-in to full-payload
retention

## usage patterns

### use `wrapWithLogging(...)`

For class methods that should emit their own started/completed/error
envelope:

- `V3.act`
- `V3.extract`
- `V3.observe`
- page/understudy methods
- agent execute entrypoints

### use `runWithLogging(...)`

For closures or non-decorator code paths that still need a lifecycle
envelope.

### use `withContext(...)` / `resolveContext(...)`

For callback-driven or later async work:

- websocket callbacks
- CDP response/message dispatch
- detached async continuations

`currentContext` is strict and may throw when ALS is missing.
`resolveContext(...)` is the non-throwing lookup for “active ALS if
present, otherwise instance-owned fallback”.

### LLM logging is best-effort

If no flow context is active, LLM logging should no-op. Missing context
must not break model execution.

## reviewer expectations / edge cases

Please review this PR with these expectations in mind:

- ALS can legitimately be missing in later callbacks or detached async
work; that should not break execution unless a caller explicitly uses
strict `currentContext`
- there should be no global or multi-session fallback store; logging
belongs to a real `V3` instance/session
- parent-chain correctness matters more than sink behavior; events must
land under the right parent chain
- sink failures should not break the main library; sinks are
intentionally best-effort
- completion/error pretty lines should resolve back to the original
started-event ancestry, not synthetic completion ids
- `sessionId` often matches `browserbaseSessionId` today, but code
should not rely on that as a permanent invariant
- this PR is about FlowLogger/EventStore architecture and correctness,
not a model-default migration

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
# why
Nick's PR had both some server-v3 changes and server-v4 changes. I split
it into two prs - just the v3 changes here, and just the v4 changes
[here](#1840) (WIP).

Then, once I rebased this PR, it's really just one small change to the
node SEA binary stuff.

# test plan
Verified the split with exact file manifests before creating the branch
and ran `pnpm install --lockfile-only --ignore-scripts`.




<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Renamed the Stagehand server to `packages/server-v3` and
`@browserbasehq/stagehand-server-v3`. Updated CI, release, tests, and
SEA build logic; no API or runtime changes.

- **Refactors**
- Moved `packages/server` to `packages/server-v3` and renamed the
package to `@browserbasehq/stagehand-server-v3`.
- Updated GitHub workflows (CI, SEA build, release) and artifacts to
`stagehand-server-v3-*`.
- Switched OpenAPI/Stainless references to
`packages/server-v3/openapi.v3.yaml`.
- Updated test discovery/commands and Turbo tasks to target
`@browserbasehq/stagehand-server-v3`; adjusted ESLint, workspace, and
scripts accordingly.
- Hardened SEA build: verify Node binary includes the required fuse,
fall back to the official Node distro when needed, enforce fuse
presence, use `stagehand-server-v3-sea` temp paths, centralize the fuse
value, and add a clear cache recovery hint when the cached Node binary
lacks the fuse.

- **Migration**
- Use `@browserbasehq/stagehand-server-v3` in `pnpm`/Turbo filters and
scripts.
  - Run local tasks from `packages/server-v3`.
- For SEA builds/tests, use binaries named
`stagehand-server-v3-<platform>-<arch>` and set `SEA_BINARY_NAME` if
needed.

<sup>Written for commit 645a2e6.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1839">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…1846)

# why

Monarch Money reported a bug in the Python SDK (stagehand-python
v3.6.0).

In src/stagehand/_streaming.py, both the sync Stream.__stream__() (line
62-63) and async AsyncStream.__stream__() (line 147-148) have:

```
python
if sse.data.startswith('{"data":{"status":"finished"'):
    break  # ← drops the event without yielding it
```

When the server sends the finished SSE event (which contains success,
message, actions, usage, and messages), the SDK immediately breaks out
of the loop without yielding the event to the caller. This means the
final result payload is silently dropped.

# what changed

The streaming on_event config had `handle: done` for the
`{"data":{"status":"finished"...}}` event. Stainless generates this as a
bare break, which exits the SSE loop without yielding the event –
silently dropping the final result payload (success status, message,
actions, usage, and messages) from every streaming response.

Fix – Changed to `handle: yield` so the finished event is delivered to
callers before the stream terminates. This is safe because the server
explicitly calls `reply.raw.end()` immediately after sending the
finished event (packages/server-v3/src/lib/stream.ts), so the SSE
connection closes right after the yield and the loop exits naturally on
EOF – no hang risk.

# test plan

Regenerate SDKs and confirm fix
# why
The intent is we just want the v4 stubs - and the openapi spec (and a
few useless tests for now). This isn't all of of the v4 stubs yet
(missing `/logs` etc), but we'll add in the rest of the stubs in
parallel.


# what changed
- Removes all "copied over" v3 *functionality* and /sessions routes etc.
- Adds v4 stubs like `/browsersession` which return simple JSON objects 
# test plan
Lint + (pretty useless) tests pass, checked /documentation accessible
locally and included screenshot.
<img width="1663" height="971" alt="Screenshot 2026-03-18 at 10 36
59 AM"
src="https://github.com/user-attachments/assets/ede65965-f0b6-452e-a841-59ad25c2d456"
/>


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds `@browserbasehq/stagehand-server-v4`, a schema‑first, stubbed v4
API for BrowserSession and Page, with a regenerated OpenAPI v4 spec.
Replaces the legacy v4 sessions runtime with simple stubs and a minimal
server boot, leaving `packages/server` unchanged.

- **New Features**
- Full v4 routes: `/v4/browsersession` (create/get/end + actions) and
`/v4/page` (navigation, frames, input, CDP, screenshots, etc.), plus
`/healthz` and `/readyz`.
- Zod v4 schemas for BrowserSession/Page expose OpenAPI components;
generator builds from route registries (`browserSessionRoutes`,
`pageRoutes`); `openapi.v4.yaml` updated.
- Integration tests for v4 BrowserSession and Page; test utils updated
for v4 shapes and a default `x-model-api-key`.

- **Refactors**
- Removed legacy v4 runtime and infra (old `sessions/*` routes, session
store, SSE, auth/env/logging/response, CORS/metrics, SEA fuse) and
legacy tests; server registers only v4 routes/components.
- Simplified health/readiness handlers and error handling with a
lightweight `AppError` in `src/types/error.ts`.
- Fixed query parameter handling in stub GET handlers and OpenAPI so GET
endpoints use the correct querystring shapes.

<sup>Written for commit 8dede23.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1840">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

Browserbase can solve captchas asynchronously, but agents were still
trying to interact with the page while the solver was active. That led
to CUA and DOM/hybrid flows clicking solved captcha widgets again,
pausing on confirmation questions, or resuming with stale assumptions
instead of continuing the original task cleanly.

This PR pauses agent execution while Browserbase's captcha solver is
active and hardens the post-solve resume path so the agent keeps working
on the original task after Browserbase finishes.

# what changed

- added a shared `CaptchaSolver` utility that listens for Browserbase
`browserbase-solving-started/finished/errored` console events, supports
concurrent waiters, and disposes listeners cleanly
- paused DOM/hybrid `prepareStep` execution and CUA `prepareStep` /
action execution while Browserbase is solving a captcha
- enabled Browserbase captcha solving by default unless
`browserSettings.solveCaptchas: false`
- updated agent prompts and follow-up messages to tell the model that
captchas are handled automatically and should not be clicked again after
they are solved
- added OpenAI CUA recovery behavior so it can:
  - carry one-shot context notes into the next model turn
  - auto-continue when the model asks for confirmation instead of acting
- guard post-solve clicks that target the solved captcha widget and
restate the original instruction so the model re-anchors on the task
- added focused unit coverage for solver state, Browserbase session
accessors, CUA/regular agent hooks, and OpenAI CUA confirmation handling

# test plan

- `pnpm --filter @browserbasehq/stagehand lint`
- `pnpm --filter @browserbasehq/stagehand build:esm`
- `cd packages/core && pnpm exec vitest run --config
vitest.esm.config.mjs dist/esm/tests/unit/openai-cua-client.test.js
dist/esm/tests/unit/captcha-solver.test.js
dist/esm/tests/unit/agent-captcha-hooks.test.js
dist/esm/tests/unit/browserbase-session-accessors.test.js`

Browserbase smoke:

- OpenAI CUA + reCAPTCHA demo: strict pass on `Verification Success`
- Anthropic CUA + reCAPTCHA demo: strict pass on `Verification Success`
- Hybrid Gemini + reCAPTCHA demo: strict pass on `Verification Success`
- OpenAI CUA + `solveCaptchas: false`: solver stays disabled, no
wait/resume path is triggered, and the agent stops at the captcha
instead of bypassing it

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shrey Pandya <shrey@browserbase.com>
shrey150 and others added 30 commits June 17, 2026 10:05
…2249)

## Summary

Linear:
https://linear.app/browserbase/issue/STG-2278/add-did-you-mean-suggestions-and-telemetry-for-unknown-browse-commands

Adds a `command_not_found` oclif hook to the browse CLI that prints a
did-you-mean suggestion for unknown commands and emits a new
`cli.command_not_found` telemetry event, while preserving oclif's
standard "command not found" error and exit code 2.

## Impact if merged

Unknown commands (`browse sessions` / `search` / `contexts` / `auth
status` — the old Commander-era syntax that agents were trained on, plus
plain typos) currently exit 2 with no suggestion and emit NO telemetry
event, so this failure class is invisible by construction. Old-binary
telemetry shows the pattern is real (1,310 commander-error events from
`sessions.list` alone in 30d; 115 from `search`). It's a
failed-first-command class, and a failed first command cuts 7-day
retention 12.4x (0.42% vs 5.21%). Did-you-mean turns an agent guess-loop
into a one-turn recovery, and the new `cli.command_not_found` event
finally lets us size and rank the dead ends. No new dependency —
deliberately avoids `@oclif/plugin-not-found`, which prompts
interactively (agent-hostile).

## Implementation notes

- **New hook** `src/hooks/command-not-found.ts`, registered in the
`oclif.hooks` config. Suggestion order: explicit alias table first
(old-CLI syntax → current tree, e.g. `sessions` → `cloud sessions list`,
`auth status` → `doctor`, `search` → `cloud search`), then nearest match
by Levenshtein over `config.commandIDs` with a distance threshold (the
did-you-mean clause is omitted when nothing decent matches). Alias
targets are validated against the live command tree at runtime and
against `oclif.manifest.json` in tests, so they can't silently drift.
- **Privacy: id + suggestion only, never argv.** oclif's spaced-topic
parsing glues unknown leading argv tokens into the attempted id (e.g.
`browse opne https://example.com` arrives as
`opne:https://example.com`), so the hook sanitizes down to leading
command-shaped tokens and reports only the matched prefix (or the first
token when nothing matches). The telemetry payload carries exactly
`attempted_command` and `suggested_command` — URLs, selectors, queries,
and secrets never leave the machine. Covered by a dedicated test
asserting argv values are absent from captured payloads.
- **Exit semantics preserved.** A `command_not_found` hook that returns
normally makes oclif treat the invocation as handled (exit 0), silently
swallowing the failure. The hook therefore re-throws oclif's standard
`CLIError("command <id> not found")` after printing the suggestion,
keeping stderr output and exit code 2 byte-identical to current
behavior.
- **Telemetry can't hang or get lost.** The event reuses the existing
PostHog transport (400ms abort timeout, best-effort catch) and is
awaited inside the hook before the error is thrown, so it is delivered
before process exit but cannot delay it beyond the transport timeout.
The `finally`-hook completion path early-returns for unknown commands
(prerun never fires), so there is no double counting.
- No new runtime dependency; ~140 LOC of source plus tests.

## E2E Test Matrix

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `node bin/run.js sessions` (local build) | stderr: `"browse sessions"
is not a browse command. Did you mean "browse cloud sessions list"? Run
browse --help for all commands.` then `Error: command sessions not
found`; `echo $?` → `2` | Proves alias suggestion + preserved exit code
on the highest-volume old-syntax pattern |
| `node bin/run.js auth status` | stderr: `"browse auth status" is not a
browse command. Did you mean "browse doctor"? ...`; exit `2` | Proves
multi-token alias matching (`auth:status` → `doctor`) |
| `node bin/run.js search "test"` | stderr: `"browse search" is not a
browse command. Did you mean "browse cloud search"? ...`; exit `2`; the
query token is not shown as part of the attempted command | Proves alias
prefix matching strips trailing user args from messaging |
| `node bin/run.js opne https://example.com` | stderr: `"browse opne" is
not a browse command. Did you mean "browse open"? ...`; exit `2` |
Proves Levenshtein typo fallback; URL excluded from the attempted
command |
| `node bin/run.js open https://example.com --local` | JSON result
`{"mode": "managed-local", ..., "title": "Example Domain", "url":
"https://example.com/"}`; exit `0`; no suggestion output | Proves valid
commands are completely unaffected (hook never fires) |
| Live telemetry capture: `BROWSERBASE_TELEMETRY_HOST=<local capture
server>` + `node bin/run.js auth status` | Capture server logged `POST
/i/v0/e/` with `"event": "cli.command_not_found"`, `"attempted_command":
"auth.status"`, `"suggested_command": "doctor"` plus standard
env/version props; payload received before CLI exit `2`; no argv content
in payload | Proves the event actually sends, flushes before process
exit, and carries only id + suggestion |
| `pnpm test` (builds then vitest) | `Test Files 16 passed (16), Tests
229 passed (229)` — includes 13 new unit/integration tests (alias table
validity vs manifest, Levenshtein, thresholds, token sanitization,
built-CLI suggestion/exit-code/telemetry/privacy) | Full regression
sweep; existing telemetry suite still green |
| `pnpm lint` | prettier + eslint + `tsc --noEmit` all pass | Supporting
evidence only |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds did-you-mean suggestions and privacy-safe telemetry for unknown
`browse` CLI commands, while keeping the standard error output and exit
code 2. Addresses Linear STG-2278 by helping users recover from old
syntax and typos; typo matching now uses `fastest-levenshtein`.

- **New Features**
- Added a `command_not_found` hook that prints a suggestion using an
alias table for old syntax, with segment-aligned Levenshtein fallback
for typos; omitted when no good match.
- Sends `cli.command_not_found` telemetry with strict privacy: only the
sanitized attempted command id and the suggested command, never raw
argv.
- Preserves default behavior by rethrowing the not-found error (stderr
unchanged, exit code 2) and avoids `@oclif/plugin-not-found`.
  - Removed misleading `auth`/`login` → `doctor` suggestions.

<sup>Written for commit bcee6ed.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2249?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…-failure backoff (#2248)

## Summary

Makes browse driver (browser session) failures actionable, classified,
and self-correcting. Today an invalid `BROWSERBASE_API_KEY` surfaces a
bare `Error: 401 Unauthorized` with no remediation, the 5s init-failure
cache makes every retry instant and identical, and most driver failures
reach telemetry as `unexpected`.

Linear:
[STG-2277](https://linear.app/browserbase/issue/STG-2277/make-browse-driver-errors-actionable-with-result-codes-and-init)

## Impact if merged

This targets the browse CLI's largest failure mode by volume and by user
pain. 71 installs stuck in get/screenshot retry loops generate 92.3% of
ALL CLI telemetry (~5.5M events/30d); ~375k of those events come from
tagged claude-code and codex agents on current versions — exactly the
target ICP (coding agents driving browsers). Root cause (smoke-tested):
any `BROWSERBASE_API_KEY` forces remote mode; an invalid key surfaces a
bare `Error: 401 Unauthorized` with no remediation, and the 5s
init-failure cache makes every retry instant, so agents can't
self-correct and loop forever. Separately, 2,337 distinct users hit
missing_api_key/auth_401 in 30d, and `open` — whose failures are 94%
unclassifiable today (result_code `unexpected`) — gates activation: only
28.5% of real users reach an activated session, and a failed first
command cuts 7-day retention 12.4x. This PR makes auth/driver failures
actionable (agents recover in one turn) and classified (we can finally
measure why open fails).

## Implementation notes

- **Remote init classification** (`remote.ts`): new
`classifyRemoteInitError()` duck-types the SDK error's `status` — 401 →
`remote_auth_401` (invalid-key message with settings link, `--local`,
`browse doctor`), 403 → `remote_auth_403` (permissions/plan wording,
same escape hatches), other → `remote_session_create_failed` (original
message preserved + `browse doctor` pointer). Wired through the
`RemoteCapability` interface so local-only builds compile.
- **Chrome-not-found** (`session-manager.ts`): chrome-launcher's
`ERR_LAUNCHER_NOT_INSTALLED` / `ERR_LAUNCHER_PATH_NOT_SET` failures in
managed-local mode get install/`--cdp`/remote guidance instead of
leaking launcher internals.
- **Init-failure backoff**: cached init failures now back off
exponentially — `min(5s * 2^(n-1), 5min)` — reset on success and
`close()`. After ≥3 consecutive failures the cached message gains a
`(failing repeatedly — fix BROWSERBASE_API_KEY, use --local, or run
browse doctor)` suffix (deduped on rethrow).
- **Result codes over the daemon protocol**: `ErrorResponseSchema` gains
optional `code`/`httpStatus` (backward compatible — old daemons omit
them); the daemon's `formatError` surfaces them from typed
`DriverError`s; the client rethrows as `CommandFailure` with
`resultCode`/`httpStatus` so the existing #2210 telemetry plumbing
records them. Client-side fail sites tagged: `daemon_lock_timeout`,
`daemon_unresponsive`, `daemon_socket_timeout`, `daemon_spawn_failed`.
Already-authored driver errors tagged: `stale_ref` (unknown ref),
`no_active_page`.
- **Local-only build contract preserved**: remediation strings that
mention `BROWSERBASE_API_KEY` live behind the remote capability
(`driverInitHints()`), so the `build:local-only` artifact stays key-free
(guarded by the existing `local-only-build.test.ts`, which caught the
first draft).

## E2E Test Matrix

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `BROWSERBASE_API_KEY=bb_invalid_test <local build> get url` |
`Browserbase rejected your BROWSERBASE_API_KEY (401 Unauthorized). A set
key makes browse default to remote mode. Check the key at
https://browserbase.com/settings, run without one using --local (browse
open <url> --local), or diagnose with browse doctor.` exit=1 | Proves
the new 401 classification flows daemon → protocol → client → stderr
end-to-end against the real Browserbase API. |
| Same command 4x rapidly (cached failure window) | Identical actionable
message each time, ~400ms per run (no remote round-trip) | Proves cached
failures keep the actionable message and stay instant; does not by
itself prove backoff growth. |
| Same command after 6s, then after 11s more (real failures #2, #3) |
Message gains ` (failing repeatedly — fix BROWSERBASE_API_KEY, use
--local, or run browse doctor)` suffix, exactly once, exit=1 | Proves
the ≥3-consecutive-failures hint and suffix dedupe on the live failure
path. |
| Valid key: `open https://example.com` → `get title` → `stop` |
`"mode": "remote" ... "title": "Example Domain"`, then `{"title":
"Example Domain"}`, then `{"stopped": true}` | Proves the remote happy
path is unchanged (no regression in outputs or exit codes). |
| `env -u BROWSERBASE_API_KEY <local build> open https://example.com
--local` → `get url` | `"mode": "managed-local" ... "url":
"https://example.com/"`, then `{"url": "https://example.com/"}` exit=0 |
Proves keyless managed-local mode is unaffected. |
| `get text @9-99` on the local session | `Unknown ref "9-99" - run
browse snapshot first to populate refs (have 0 refs).` exit=1 | Proves
the stale-ref message is unchanged while now carrying `stale_ref`
through the protocol (round-trip unit-tested). |
| `browse doctor` with and without key | `Status: ok` in both; `target
remote` with key, `target managed-local` without | Proves doctor
behavior unchanged. |
| `pnpm build` + `pnpm lint` (prettier, eslint, tsc) | All pass |
Supporting only. |
| `pnpm test:cli` | 16 files / 228 tests pass, incl. new
`driver-errors.test.ts` (classification, backoff schedule,
chrome-not-found detection, protocol round-trip, key-free local-only
hints) and the `local-only-build` artifact guard | Supporting; covers
mappings and the local-only security contract not exercised by live
smokes. |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Makes browse driver failures actionable and self-correcting with
classified result codes and exponential init backoff. Addresses Linear
STG-2277 by giving clear fixes for bad `BROWSERBASE_API_KEY`, missing
Chrome/Chromium, and daemon issues, while improving telemetry.

- **New Features**
- Classify remote init errors into actionable messages with codes:
`remote_auth_401`, `remote_auth_403`, `remote_session_create_failed`
(with links to settings, `--local`, and `browse doctor`).
- Add error result codes to the daemon protocol (`code`, `httpStatus`)
and propagate to the client for telemetry.
- Exponential backoff for cached init failures (5s doubling, capped at 1
minute) with a “failing repeatedly” hint after 3 failures.
- Tag common failures with stable codes: `daemon_lock_timeout`,
`daemon_unresponsive`, `daemon_socket_timeout`, `daemon_spawn_failed`,
`stale_ref`, `no_active_page`, `no_chrome_found`.
- Use `http-status-codes` for status mapping and extract chrome-launcher
error codes to a constant (no behavior change).

- **Bug Fixes**
- Chrome-not-found now gives Chromium-first guidance: Linux `apt install
chromium`; macOS `brew install --cask google-chrome` or set
`CHROME_PATH` for Chromium, plus `--cdp` or remote as options.
- Keep the local-only build key-free by moving `BROWSERBASE_API_KEY`
remediation strings behind the remote capability.

<sup>Written for commit b7a3f7e.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2248?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…s on every command (#2258)

## Summary

In **headed managed-local** mode, the `browse` CLI stole macOS keyboard
focus on **every** subcommand, making it nearly unusable next to a
coding agent and impossible to parallelize.

**Root cause:** the browse daemon resolves the active page on every
subcommand via `ensurePage()`, which called `context.setActivePage()`
unconditionally. In core, `setActivePage` ends in a CDP
`Target.activateTarget` (`packages/core/lib/v3/understudy/context.ts`),
and on macOS `Target.activateTarget` raises the whole Chrome app to the
OS foreground — yanking focus away from your editor/terminal on each
`browse navigate / snapshot / get / …`.

**Fix:** route the three `ensurePage()` activation sites through a new
`activateIfNeeded()` helper that only re-activates when the target page
isn't already the active one. Redundant re-activation (the common
single-tab case) is skipped, so focus stays put. Explicit tab switches
(`tabs.ts`) still call `setActivePage` directly, so intentional
foregrounding (`tab new` / `tab select`) is preserved.

Scoped entirely to `packages/cli`; no core changes.

## E2E Test Matrix

Run against **real headed Chrome** (managed-local) on macOS.
Instrumentation (env-gated, not committed) counted `setActivePage` →
`Target.activateTarget` sends across a 5-command sequence: `open` →
`open` (navigate) → `snapshot` → `get url` → `screenshot`.

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| 5-command headed session, **old** unconditional behavior (env toggle)
| `setActivePage (activateTarget → FOCUS STEAL)` × **5**, skips: 0 |
Reproduces the bug: every command re-activates → macOS focus steal. |
| 5-command headed session, **with fix** | `SKIP (no focus steal)` ×
**5**, activations: **0** | Proves the fix eliminates per-command focus
theft for the normal single-tab flow. |
| `browse open https://example.com --local --headed` then navigate to
`example.org`, then `screenshot` (clean build, no instrumentation) |
`"url": "https://example.org/"`, screenshot saved (`16125` bytes) | Real
headed Chrome still navigates/snapshots/screenshots correctly after the
change — no functional regression. |
| `pnpm check` (tsc), `pnpm eslint`, `pnpm format:check` | All pass on
`session-manager.ts` | Type-safe, lint-clean, formatted. |

**Not changed:** explicit tab switching still activates the target tab
(verified the change only touches `ensurePage()` resolution; `tabs.ts`
calls `setActivePage` directly). The one-time activation from
`chrome-launcher` at browser launch is unchanged (expected — opening the
browser once is fine).

Closes STG-2333

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Stop headed managed-local sessions from stealing macOS keyboard focus on
every `browse` subcommand. We now activate a page only when it isn’t
already active, addressing STG-2333.

- **Bug Fixes**
- Added `activateIfNeeded()` and routed three `ensurePage()` calls
through it to skip redundant `Target.activateTarget`.
- Kept intentional tab foregrounding (`tab new`, `tab select`) by
leaving direct `setActivePage` calls.
- Scoped to `packages/cli`; released as a `browse` patch via changeset;
verified on real headed Chrome: 5-command flow went from 5 focus steals
to 0 with no regressions.

<sup>Written for commit 70b0ffb.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2258?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`.

What this PR does:
- bumps `packages/cli/package.json` to `0.8.5`
- updates the browse changelog
- consumes the pending browse changesets

After this PR merges, the `Release` workflow on `main` will publish
`browse@0.8.5` from that exact commit using `pnpm pack` + `npm publish
--provenance`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Release `browse@0.8.5` by bumping the package version and updating the
changelog.
This patch fixes focus stealing in headed local sessions, adds
suggestions and telemetry for unknown commands, improves driver errors
with retry backoff, adds Chrome launch-arg flags for managed-local
sessions, and emits a `skill_id` on command-completed telemetry.

<sup>Written for commit b405ea9.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2260?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@3.6.0

### Minor Changes

- [#2178](#2178)
[`c49a3fc`](c49a3fc)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add support
for WebMCP

### Patch Changes

- [#2217](#2217)
[`147e310`](147e310)
Thanks [@monadoid](https://github.com/monadoid)! - Add Azure OpenAI
Microsoft Entra ID model auth support.

- [#2231](#2231)
[`cf3603d`](cf3603d)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add
claude-fable-5 support: native structured outputs via the
@ai-sdk/anthropic bump, adaptive thinking (including the new "xhigh"
effort) on the agent path, the API's built-in server-side refusal
fallback to claude-opus-4-8, and auto tool choice for the final done
call on models that reject forced tool use.

- [#2233](#2233)
[`8d7d414`](8d7d414)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - Normalize
URLs in `ActCache` key derivation by sorting query parameters before
hashing. Semantically equivalent URLs that differ only in parameter
order (e.g. `?utm_source=email&id=42` vs `?id=42&utm_source=email`) now
hit the cache instead of silently missing. Fragments and duplicate keys
are preserved.

- [#2229](#2229)
[`fd42e65`](fd42e65)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - launch
local browser with --enable-features=WebMCPTesting,DevToolsWebMCPSupport
by default

- [#2220](#2220)
[`a64c6b7`](a64c6b7)
Thanks [@monadoid](https://github.com/monadoid)! - Fix
Stagehand-generated shadow-root XPath resolution so deterministic
actions can target elements inside web components.

- [#2132](#2132)
[`ed3e566`](ed3e566)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add canonical
verifier evidence normalization for screenshots and text signals without
requiring image dependencies in core installs.

- [#2133](#2133)
[`840aac8`](840aac8)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add the
rubric-based verifier engine with normalized public rubric output and
bounded failure-step parsing.

## @browserbasehq/stagehand-evals@2.0.3

### Patch Changes

- Updated dependencies
\[[`147e310`](147e310),
[`cf3603d`](cf3603d),
[`8d7d414`](8d7d414),
[`fd42e65`](fd42e65),
[`a64c6b7`](a64c6b7),
[`c49a3fc`](c49a3fc),
[`ed3e566`](ed3e566),
[`840aac8`](840aac8)]:
    -   @browserbasehq/stagehand@3.6.0

## @browserbasehq/stagehand-server-v3@3.7.1

### Patch Changes

- [#2217](#2217)
[`147e310`](147e310)
Thanks [@monadoid](https://github.com/monadoid)! - Add Azure OpenAI
Microsoft Entra ID model auth support.

- Updated dependencies
\[[`147e310`](147e310),
[`cf3603d`](cf3603d),
[`8d7d414`](8d7d414),
[`fd42e65`](fd42e65),
[`a64c6b7`](a64c6b7),
[`c49a3fc`](c49a3fc),
[`ed3e566`](ed3e566),
[`840aac8`](840aac8)]:
    -   @browserbasehq/stagehand@3.6.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why
- to add documentation for new webmcp related functions/behaviour


<img width="1330" height="782" alt="Screenshot 2026-06-18 at 10 57
58 AM"
src="https://github.com/user-attachments/assets/2952f136-4d01-445c-95e4-c4a81e4ec289"
/>

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds WebMCP docs to v3. Introduces a new Basics page and expands the
`Page` API reference so users can list and invoke page-registered tools
in Chrome.

- **New Features**
- Added a Basics page: overview; Chrome/Chromium 149+ and flags
`--enable-features=WebMCPTesting,DevToolsWebMCPSupport`; how to
list/invoke tools; frameId targeting; results, cancel, and timeout
defaults; examples with `@browserbasehq/stagehand`.
- Updated the `Page` reference with `listWebMCPTools()` and
`invokeWebMCPTool()` signatures, options (`timeoutMs`, `frameId`),
return shape (`result`, `cancel()`), examples, types (`WebMCPTool`,
`WebMCPToolInvocationStatus`, `WebMCPToolResult`,
`WebMCPToolInvocation`), and error cases (unsupported browser, ambiguous
tool names, result timeouts, disposed invocations).
  - Included WebMCP in the docs sidebar navigation.

<sup>Written for commit 43188ab.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2262?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
… legacy stdout) (#2246)

## Summary

Bare `browse screenshot` now writes a file by default —
`screenshot-<yyyymmdd-hhmmss>.<type>` in the current directory, never
overwriting (atomic collision counter) — and prints the small `{
"saved": "<path>" }` JSON. A new `--base64` flag preserves the legacy
stdout contract (`{ "base64": "..." }`); it is mutually exclusive with
`--path`. Explicit `--path` behavior is unchanged.

Linear:
[STG-2275](https://linear.app/browserbase/issue/STG-2275/default-browse-screenshot-to-file-output-with-base64-legacy-flag)

## Impact if merged

screenshot is one of the two commands in the runaway agent retry loops
(the loop population generates 92.3% of all CLI telemetry) and is used
by 1,337 users/30d (89.9% success last-7d). Every bare invocation today
prints ~22KB of base64 JSON directly into the calling agent's context
window — a per-call token tax on exactly the ICP population
(claude-code/codex agents driving the CLI; agent-tagged usage is
90%-successful and growing). Defaulting to a file write makes the common
case agent-safe at zero cost to `--path` users; `--base64` preserves the
old contract for scripts. Stdout contract change for bare invocations is
called out in the changeset with a `--base64` migration note (minor bump
— the bare-invocation stdout contract changes; `browse` is `<1.0.0` so
this is advisory).

## Implementation notes

- **Breaking change (stdout contract):** bare `browse screenshot` now
prints `{ "saved": "<path>" }` instead of `{ "base64": "..." }`.
**Migration:** pass `--base64` to restore the old output.
- Command-layer only: the driver handler (`runtime.ts`) already
supported both branches (`{ saved }` when a path is given, `{ base64 }`
otherwise), so the change is confined to `src/commands/screenshot.ts`.
- Default filename respects `--type` (`.jpeg` vs `.png`) and resolves
against the invoking shell's cwd (absolute path passed to the driver so
a daemon with a different cwd can't misplace the file).
- The default filename is **atomically reserved** via exclusive create
(`openSync(path, "wx")`), advancing a `-2`, `-3`, ... counter on
`EEXIST` — concurrent same-second invocations can never claim the same
file (addresses cubic's race-condition review). If the command fails
afterward, the empty placeholder is removed best-effort.
- `--base64` is mutually exclusive with `--path` via oclif's `exclusive`
option.
- Changeset: `browse` **minor** (per review) — the bare-invocation
stdout contract change is called out in the changeset body with the
`--base64` migration note.
- Bundled skill doc: `skills/browse/SKILL.md` screenshot snippet updated
in this PR (moved from #2245 per review) to document the new default:
bare saves a file, `--path` chooses it, `--base64` is the legacy stdout
form.

## E2E Test Matrix

All rows ran against the local build (`pnpm build` in `packages/cli`,
invoked as `node bin/run.js`) from a `<scratch dir>`.

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `node bin/run.js open https://example.com` | `{ "title": "Example
Domain", "url": "https://example.com/", ... }`, exit 0 | Session setup
for the runs below; proves the local build is functional end-to-end. |
| `node bin/run.js screenshot` (bare) | `{ "saved": "<scratch
dir>/screenshot-20260612-123732.png" }`, exit 0; `ls -la` shows the file
at 15,988 bytes; stdout is the small JSON only | Proves the new default:
file written to cwd with timestamped name, no base64 on stdout. |
| `node bin/run.js screenshot --base64 \| head -c 200` | `{ "base64":
"iVBORw0KGgoAAAANSUhEUgAABQgAAALH..."` (PNG magic in base64), exit 0 |
Proves the legacy stdout contract is fully preserved behind `--base64`.
|
| `node bin/run.js screenshot --path /tmp/custom.png` | `{ "saved":
"/tmp/custom.png" }`, exit 0; file exists at 15,238 bytes | Proves
explicit `--path` behavior is unchanged. |
| Two **concurrent** bare runs (same second, shell `&` + `wait`) | `{
"saved": ".../screenshot-20260612-123741.png" }` and `{ "saved":
".../screenshot-20260612-123741-2.png" }`; both files present and
non-empty (4,202 bytes each) | Proves the atomic no-overwrite
reservation under true same-second concurrency — the exact race cubic
flagged. |
| `node bin/run.js screenshot --cdp http://127.0.0.1:1` (forced failure)
| `TypeError: fetch failed`, nonzero exit; directory left empty — no
placeholder file | Proves failed runs do not leave empty
`screenshot-*.png` placeholders behind. |
| `node bin/run.js screenshot --base64 --path /tmp/x.png` | `Error:
--path=/tmp/x.png cannot also be provided when using --base64`, exit 2 |
Proves the oclif mutual exclusion works. |
| `node bin/run.js stop` | `{ "stopped": true, "session": "default" }`,
exit 0 | Clean session teardown. |
| `pnpm test` (packages/cli) | `Test Files 15 passed (15), Tests 213
passed (213)` — re-run after the race fix | Supporting: no regressions
in the existing CLI suite (includes the screenshot `--help` surface
test). |
| `pnpm lint` (packages/cli) | tsc/eslint/prettier clean | Supporting
only. |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…tages (#2250)

## Summary

Linear:
https://linear.app/browserbase/issue/STG-2279/fix-windows-skills-add-npx-quoting-and-bound-installer-timeouts

Fixes `browse skills add` on Windows (cmd.exe spawn quoting) and bounds
the two unbounded skills-installer stages (the `npx skills add` child
and the catalog/file fetches).

## Impact if merged

skills.add succeeds for 3.2% of Windows users (54 in the last 30d) —
effectively broken for every default `C:\Program Files\nodejs` Node
install, because the npx child is spawned through cmd.exe with an
unquoted path. Windows devs are a smaller slice of the CLI base, but
skill installers are the single highest-value cohort in our telemetry:
7x the engagement (median 12.5 vs 2 commands) and 19x the multi-day
retention (28.4% vs 1.5%) of non-installers, and skills.find→add is an
agent-facing funnel (51% of finders attempt an install within an hour).
This also bounds the two unbounded installer stages (npx child: 180s;
catalog fetches: 10s) — today they can hang forever, feeding the
slow-failure retry loops that dominate telemetry volume.

## Implementation notes

**Root cause.** `findExecutable` resolves `npx` via PATH+PATHEXT to
`npx.cmd` on Windows, and `spawnPassthrough` spawns it with `shell:
true` (required for `.cmd`/`.bat` shims). Node's `shell: true` joins
command+args **unquoted** into `cmd.exe /d /s /c "..."`, so `C:\Program
Files\nodejs\npx.cmd` splits at the space and cmd executes `C:\Program`
→ `'C:\Program' is not recognized` → exit 1. Install-path args under
`C:\Users\<First Last>\...` break the same way.

**Why not `shell: false`.** Spawning a `.cmd` directly with `shell:
false` throws `EINVAL` on all current Node versions — the CVE-2024-27980
hardening (Node 18.20.x / 20.12.x / 21.7.x+) forbids spawning batch
files without a shell because cmd.exe argument splitting cannot be made
injection-safe generically. So the shell path is mandatory for `.cmd`
shims, and the args must be quoted for cmd.

**Quoting semantics.** `quoteForCmdShell` wraps tokens containing
whitespace, quotes, or cmd metacharacters (`^ & | < >`) in double
quotes, doubling embedded quotes. Node wraps the joined string in outer
quotes after `/d /s /c`; with `/s`, cmd strips only those outer quotes
and executes the inner, correctly-quoted line:

```
before:  cmd.exe /d /s /c "C:\Program Files\nodejs\npx.cmd --yes skills add C:\Users\First Last\..."
after:   cmd.exe /d /s /c ""C:\Program Files\nodejs\npx.cmd" --yes skills add "C:\Users\First Last\...""
```

**Alternative considered.** Resolving `npx-cli.js` next to `npx.cmd` and
spawning `process.execPath` with `shell: false` would avoid cmd quoting
entirely, but the `npx.cmd` → `npx-cli.js` relative layout differs
across npm versions and Node distribution channels (nvm-windows, Volta,
Scoop shims, fnm), so it trades a well-understood quoting rule for
fragile path archaeology. The quoting approach is smaller and matches
what cross-platform tools (e.g. `cross-spawn`) do.

**Bounding the installer stages.**
- `spawnPassthrough` now enforces a 180s deadline: SIGTERM, then SIGKILL
after 5s if the child ignores it. A timed-out install fails with a clear
message and a distinct `skill_install_timeout` result code through the
existing `fail`/`resultCode` plumbing from #2210.
- The catalog file-list fetch, the direct-Blob HEAD probe, and
skill-file downloads now use `AbortSignal.timeout(10s)`. An aborted
catalog fetch is classified exactly like a network failure
(`unavailable`), preserving the existing fallback semantics.
- Both deadlines are env-overridable
(`BROWSE_SKILLS_INSTALL_TIMEOUT_MS`, `BROWSE_SKILLS_FETCH_TIMEOUT_MS`),
following the module's existing `BROWSE_SKILLS_*` override pattern; this
is also what makes the deadlines provable end-to-end in tests.

## E2E Test Matrix

All commands ran against the locally built CLI (`<local
build>/bin/run.js`) on macOS (darwin/arm64).

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| **Windows execution** (one-time `windows-latest` before/after run:
[actions/runs/27448084696](https://github.com/browserbase/stagehand/actions/runs/27448084696))
| System Node at `C:\Program Files\nodejs` (no setup-node), `where npx`
→ `C:\Program Files\nodejs\npx.cmd`. **main:** `browse skills install` →
exit 1, verbatim `'C:\Program' is not recognized as an internal or
external command`. **PR head d2e9098:** exit 0, `Installed 1 skill`,
`~\.agents\skills\browse\SKILL.md` present; win32 vitest gate (quoting +
shim + spawnPassthrough timeout) 10/10 passed. | Closes the Windows gap
with a real before/after on the same runner layout (win25-vs2026): the
exact predicted failure reproduces on main and the PR build installs
end-to-end. Full evidence in the [validation
comment](#2250 (comment)).
|
| `browse skills find flights` (real catalog) | exit 0; returned
`google.com/search-flights-ts4g1f` with full metadata | Proves catalog
discovery is unaffected. |
| `browse skills add google.com/search-flights-ts4g1f` (real catalog,
real `npx`) | exit 0; `Downloaded 2 skill files to <config dir>`; `npx
skills add` installed the skill ("Installed 1 skill ... Done!") | Proves
the darwin install path (quoting branch not taken) still works
end-to-end with the new deadline code in place — no regression. |
| Quoting before/after for `C:\Program Files\nodejs\npx.cmd` (unit tests
+ helper output) | before: `C:\Program Files\nodejs\npx.cmd --yes skills
add C:\Users\First Last\...` (unquoted → cmd runs `C:\Program`); after:
`"C:\Program Files\nodejs\npx.cmd" --yes skills add "C:\Users\First
Last\..."` | Reproduces the bug shape and asserts the exact corrected
command line, incl. embedded-quote doubling, `& \| ^ < >` metachars, and
the empty token. Static proof only — see Windows row. |
| Hung `npx` stub (`exec /bin/sleep 600`) +
`BROWSE_SKILLS_INSTALL_TIMEOUT_MS=2000` → `browse skills install` | exit
1 after 2s elapsed (timed): `Skill install timed out after 2s waiting
for \`npx skills add\`...` | Proves the deadline kills a hung child and
surfaces the timeout failure (`skill_install_timeout` flows through the
same `fail` plumbing verified in #2210). Also covered by
`spawnPassthrough` unit tests (timeout + non-timeout control). |
| Hung catalog server (accepts, never responds) at **default timeouts**
→ `browse skills add google.com/search-flights-ts4g1f` with stubbed
`npx` | exit 0 after 21s (10s API fetch abort + 10s Blob HEAD abort);
npx stub invoked with `--yes skills add browserbase/browse.sh --skill
google.com/search-flights-ts4g1f` | Proves a hung catalog aborts at the
10s default and the catalog-unavailable fallback semantics are
preserved. Previously this hung forever. Also covered by a fast
CLI-level test with `BROWSE_SKILLS_FETCH_TIMEOUT_MS=500`. |
| `npx vitest run` (packages/cli) | 15 files, 224 tests passed (incl. 13
in skills-install.test.ts) | Full CLI suite green; supporting evidence
only. |
| `pnpm lint` (packages/cli) | exit 0 (prettier + eslint + tsc) |
Supporting evidence only. |

**Windows gap closed:** a one-time `windows-latest` before/after run
([actions/runs/27448084696](https://github.com/browserbase/stagehand/actions/runs/27448084696),
details in the [validation
comment](#2250 (comment)))
reproduced the exact `'C:\Program' is not recognized` failure on main
and verified `browse skills install` succeeds end-to-end on this PR's
build with the default `C:\Program Files\nodejs` system Node. Full
Windows vitest: 205/224 passed; all 17 failures are pre-existing POSIX
test-harness assumptions (`#!/bin/sh` npx stubs etc.), identical by
construction on main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes Windows failures in `browse skills add` by quoting the `npx`
command when spawned via cmd.exe and bounding installer/fetch stages to
prevent hangs. Also runs the full CLI test suite on Linux and Windows
via a matrix job. Addresses Linear: STG-2279.

- **Bug Fixes**
- Quote command and args when spawning `.cmd`/`.bat` through the shell,
so `C:\Program Files\nodejs\npx.cmd` and paths with spaces work.
- Add a 180s deadline to `npx skills add` (SIGTERM, then SIGKILL) and a
10s abort for catalog/file fetches; both overridable via
`BROWSE_SKILLS_INSTALL_TIMEOUT_MS` and `BROWSE_SKILLS_FETCH_TIMEOUT_MS`;
install timeouts surface `skill_install_timeout`.
- Run the full CLI suite on `ubuntu-latest` and `windows-latest` via a
matrix; POSIX-only tests are guarded via `itPosix`/`describePosix` so
Windows gets full coverage without brittle CI filters.

<sup>Written for commit 9fe60b7.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2250?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`.

What this PR does:
- bumps `packages/cli/package.json` to `0.9.0`
- updates the browse changelog
- consumes the pending browse changesets

After this PR merges, the `Release` workflow on `main` will publish
`browse@0.9.0` from that exact commit using `pnpm pack` + `npm publish
--provenance`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Release `browse@0.9.0` by bumping the package and changelog; includes a
new default for `browse screenshot` (saves to a file) and a Windows
reliability fix for `browse skills add` with bounded timeouts. If your
scripts parsed base64 from stdout, pass `--base64` to keep the old
behavior.

<sup>Written for commit 8178729.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2276?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…cular-structure JSON (#2278)

## What & why

`stagehand.agent({ integrations })` is unusable with a local/stdio MCP
server. `agent()` logs its config as auxiliary metadata via
`JSON.stringify(options.integrations)`, but an MCP `Client` instance
(what `connectToMCPServer({ command, args })` returns) is a **circular
object** — so the call throws **`TypeError: Converting circular
structure to JSON`** *before the agent ever runs*.

This blocks the documented stdio-MCP path entirely (URL-string
integrations are unaffected, since a string serializes fine).

## The fix

`packages/core/lib/v3/v3.ts` — in the agent-creation log, serialize a
**safe descriptor** instead of the raw array: keep URL strings,
summarize `Client` instances as `"[mcp client]"`. One site; it runs for
all agent modes (dom/hybrid/cua). No public API change. Patch changeset
added.

```ts
// before
value: JSON.stringify(options.integrations),
// after
value: JSON.stringify(
  options.integrations.map((i) => (typeof i === "string" ? i : "[mcp client]")),
),
```

## E2E Test Matrix

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| **Before fix** (published `3.6.0`): `agent({ integrations:
[connectToMCPServer({command:"npx", args:[…filesystem MCP…]})] })` run
on live Browserbase | `TypeError: Converting circular structure to JSON
… at V3.agent` — crashes **before** the agent starts. Reproduced on a
**second** MCP server (`@modelcontextprotocol/server-everything`) → not
server-specific. | Proves the bug and that it's general to any `Client`
integration. |
| **After fix** (local build copied into a scratch project), same script
on live Browserbase | exit 0, **no circular error**; agent connected to
the MCP server (`Secure MCP Filesystem Server running on stdio`) and
**invoked its tools** (got a real tool response). | Proves the fix
unblocks `agent({ integrations: [client] })` end-to-end. |
| `pnpm build:esm` (core) + grep built `v3.js` | build exit 0; built
output contains the safe descriptor, no
`JSON.stringify(options.integrations)`. | Fix compiles and is present in
the artifact under test. |

> Note: server-side ingestion is irrelevant here — this is a pure
client-side serialization crash; the matrix exercises the exact throwing
call path before and after.

Closes STG-2405.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes a circular-JSON crash when creating an agent with MCP `Client`
integrations (e.g., from `connectToMCPServer`). We now log a safe
descriptor for `integrations`, so `agent({ integrations: [client] })`
works with local/stdio MCP servers across all modes. Addresses STG-2405.

<sup>Written for commit 01323a5.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2278?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why
New model dropped

# what changed
Added support for the Gemini 3.5 Flash Computer Use updated toolset in
`GoogleCUAClient.ts`, with all new tool formats correctly mapped.

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds support for the `google/gemini-3.5-flash` computer-use agent.
Normalizes Gemini 3.x tool names/args to 2.5 handlers, preserves click
semantics, validates coordinates (rejects missing/NaN/Infinity), always
returns a fresh screenshot, and surfaces reasoning/cached tokens.

- **New Features**
- Enable `google/gemini-3.5-flash` in agent/LLM provider maps and public
types; update tests.
- Map 3.x functions to 2.5 handlers and accept new arg shapes:
coordinate-less `type`, `keys` array or single `key`,
`magnitude_in_pixels` for `scroll`, drag start/end pairs; recognize
`screenshot`/`take_screenshot`; coordinate-less `scroll` falls back to
PageUp/PageDown; alias `wait` to `wait_5_seconds`.
- Always return a screenshot function response even when no executable
actions are produced.

- **Bug Fixes**
- Track `reasoning_tokens` and `cached_input_tokens` in Google CUA usage
(per step and aggregated).
- Preserve 3.x click-family semantics (`double_click`, `triple_click`,
`right_click`, `middle_click`, `move`) and drop calls with missing or
non‑finite coordinates; add explicit `click_at` guard and a shared
finite-number check; add unit tests for conversion/guards.
- Guard required args and log custom‑tool collisions: reject `navigate`
without `url` and `type`/`type_text_at` without `text` (empty allowed);
log when a custom tool name conflicts with a predefined function
(predefined wins).

<sup>Written for commit eb1e3a7.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2273?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
… sessions and cloud API headers (#2277)

## What & why

CLI-driven Browserbase usage isn't fully attributable today:
- Remote **browser sessions** the CLI creates are tagged
`userMetadata.browse_cli:"true"`, but carry no install/version, so we
can't tie usage to an install or correlate with the CLI's anonymous
PostHog telemetry.
- `browse cloud search` (`/v1/search`) and `browse cloud fetch` create
**no session at all**, so they're invisible in session metadata.

This PR stamps a stable anonymous **`install_id`** + **`cli_version`**
onto both paths.

## Changes (`packages/cli` only)

- **New `src/lib/identity.ts`** — single source for install identity:
`resolveInstallId` (async, memoized, **atomic** write via
exclusive-create + EEXIST re-read), `peekInstallId` (sync, never
blocks), `getCliVersion`, and `toMetadataValue` (sanitizes
session-metadata values). install-id logic moved verbatim out of
`telemetry.ts`; its tests pass unchanged.
- **Sessions** — `driver/remote.ts` `remoteStagehandOptions()` adds
sanitized `install_id` + `cli_version` to `userMetadata` (made async;
resolver awaited with a safe fallback so telemetry never throws).
Interface, local-only stub (now properly async so `.catch` works), and
the call site updated.
- **Cloud headers** — `lib/cloud/api.ts` sends `x-bb-client:
browse-cli/<version>` (+ `x-bb-install-id` when resolved) on **both**
transports: the raw `requestBrowserbaseJson` helper (covers `search`,
sessions, contexts, projects, extensions) and
`createBrowserbaseClient()` `defaultHeaders` (covers `fetch`,
functions). Never emits empty-value headers.
- Patch **changeset**.

## Why sanitize values

Browserbase session-create runs `validateMetadataObject` — values must
match `[\w\-_,;:.()&$%#@!?~]` and total ≤512 chars. A `+build` semver
would otherwise **400 every remote session**, so
`cli_version`/`install_id` are passed through `toMetadataValue()` before
reaching `userMetadata`. (HTTP headers are unconstrained, so the full
version stays in `x-bb-client`.)

## E2E Test Matrix

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `<local build> open https://example.com --remote` → `cloud sessions
list --status RUNNING` | session `userMetadata` = `{stagehand:"true",
browse_cli:"true", install_id:"<uuid>", cli_version:"0.9.0"}` | All 3
attribution keys land on a driver-created remote session; `install_id`
equals the on-disk marker. (Server-side ingestion out of scope.) |
| `cloud search "..."` and `cloud fetch https://example.com` against a
local capture server (`--base-url`) | `/v1/search` and `/v1/fetch` each
received `x-bb-client: browse-cli/0.9.0` + `x-bb-install-id: <uuid>` |
Exact outgoing headers confirmed on **both** the raw-helper (search) and
SDK (fetch) paths. |
| `cloud search "..."` and `cloud fetch https://example.com` (live API)
| real results JSON; fetch `200` + markdown | New headers don't break
live calls. |
| Migration smoke against built `dist/lib/identity.js`: seed legacy
`~/…/cli/telemetry-id` = `1111…5555`, then `resolveInstallId` with
`XDG_CONFIG_HOME` pointed at a temp dir | returned `1111…5555`; new
`<tmp>/browserbase/install-id` contains `1111…5555` (fresh-dir case
mints a new uuid instead) | Legacy id is carried forward to the new
canonical path, not reset; first-run still mints. Also confirmed on real
disk: existing `~/Library/.../telemetry-id` id copied to
`~/.config/browserbase/install-id`, legacy file intact. |
| Read-only-FS / Lambda resilience against built `dist`: (a)
`HOME=/tmp/...` writable, (b) read-only `0555` dir, (c) `ENOTDIR` under
`/dev/null`, (d) unwritable `HOME=/var/empty`, plus real `browse
--version` and `browse cloud sessions list` under unwritable `HOME` |
(a) persists to `/tmp/.../.config/browserbase/install-id`; (b)(c)(d)
return a valid in-memory UUID, **nothing written, no throw**; real CLI
exits cleanly (version prints; cloud cmd returns a clean `401`, not an
FS crash); no illegal writes under `/var/empty` | install-id resolution
+ migration are best-effort: every read/mkdir/write is guarded and all 4
callers wrap in `.catch`. On Lambda (`HOME=/tmp`) it persists; if the
dir/file is unwritable it degrades to a per-invocation in-memory id
without failing the command. |
| `turbo run build --filter=browse` · `pnpm lint` · `pnpm test:cli` |
build + lint clean · **299/299** tests pass (+7 path-resolution /
migration tests over the prior 292) | No regressions; path change +
migration covered. |

## Review follow-ups (addressed)

All 5 Cubic threads resolved: async local-only stub (so `.catch` works),
atomic install-id write (race-safe on concurrent first runs —
pre-existing behavior, hardened), and 12 focused unit tests for
`toMetadataValue` (allowed-char filtering, `+build` stripping,
truncation, UUID round-trip), the attribution headers, and the
`remoteStagehandOptions` success + fallback paths.

## Dependency / follow-up (not in this PR)

Session `userMetadata` keys are queryable in Snowflake today
(`STG_SESSIONS.SESSION_METADATA`). The **search/fetch headers** only
become useful once Platform logs `x-bb-client` / `x-bb-install-id` on
those endpoints (`/v1/fetch` has an unpopulated `fetch_tasks.headers`
column not yet in the Estuary mirror; `/v1/search` writes no DB row) —
tracked as a server-side follow-up.

## Update — also tags `cloud sessions create`

Extended attribution to the `browse cloud sessions create` path too
(previously only the driver `open --remote` path carried it): its
`userMetadata` now includes `browse_cli` + `install_id` + `cli_version`
(sanitized via `toMetadataValue`), merged with any user-supplied
`--body` metadata while keeping the attribution keys authoritative — a
user can't spoof `browse_cli` to `"false"`. **So every CLI-created
session is attributable — driver *and* `cloud sessions create`.**
Verified live: `cloud sessions create` → readback `userMetadata: {
browse_cli:"true", install_id:"…", cli_version:"0.9.0" }` → released. +2
tests (292 total cli tests).

## Update — standardized install-id path (review follow-up)

Per review (thanks @pirate), moved the anonymous install-id marker off
the bespoke per-OS path (`~/Library/Application
Support/Browserbase/cli/telemetry-id`,
`%APPDATA%/Browserbase/cli/telemetry-id`,
`<xdg>/browserbase/cli/telemetry-id`) to the **standardized
`~/.config/browserbase/install-id`** — consistent with core
(`BROWSERBASE_CONFIG_DIR`) and the CLI's own
`~/.config/browserbase/skills`. Honors `BROWSERBASE_CONFIG_DIR`, falls
back to `XDG_CONFIG_HOME`/`~/.config` on every platform; the
`BROWSERBASE_TELEMETRY_INSTALL_ID_FILE` override still short-circuits
everything (incl. migration).

**Backwards-compatible:** if the canonical file is absent but a legacy
marker exists, its UUID is copied forward so existing installs keep
their stable id — no attribution reset; the legacy file is left intact.
Renamed `telemetry-id` → `install-id` since it's no longer
telemetry-only (and `install-id`, not `device-id`, because it's a
per-install id, not a hardware fingerprint). Considered and declined
`node-machine-id`: a cross-app hardware id doesn't fix ephemeral-fleet
counting (unpredictable across customer image strategies) and conflicts
with the anonymous, install-scoped intent.

Closes STG-2404.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why

users need a way to block known domains during browser sessions

# what changed

- users can now call `stagehand.context.setDomainPolicy({ blockedDomains
})` to block requests to specific domains across the whole context
- stagehand turns those domains into cdp `Fetch` request patterns, so
only matching blocklist requests are paused instead of intercepting all
traffic
- blocked requests are failed with `BlockedByClient`, which surfaces in
chrome as a client-side network block
- the policy is applied to already-open pages & automatically applied to
new pages, popups, & attached frame targets
- users can clear the policy with `setDomainPolicy(null)` or `{
blockedDomains: [] }`; clearing disables policy interception & removes
stagehand's request listener
- invalid domain inputs like full urls, paths, ports, queries, or
malformed wildcards throw an `StagehandInvalidArgumentError`


### fast follow:
- will follow up with a PR to add a domain allowlist, eg
`stagehand.context.setDomainPolicy({ allowedDomains })`

### behavioural notes:

- `setDomainPolicy({ blockedDomains: [...] })` applies to active context
sessions & future pages/targets
- exact domains like `ads.example.com` block only that hostname
- wildcard domains like `*.example.com` block subdomains, but not the
apex domain
- when no policy is set, stagehand does not enable `Fetch` interception
for this feature
- clearing with `setDomainPolicy(null)` or `{ blockedDomains: [] }`
disables the policy & removes stagehand's `Fetch.requestPaused` listener

# test plan

- `packages/core/tests/unit/domain-policy.test.ts` validates domain
normalization, exact/wildcard matching, invalid domain rejection &
generated `Fetch.RequestPattern` values
- `packages/core/tests/unit/context-domain-policy.test.ts` validates
`context.setDomainPolicy()` enables/disables `Fetch`, removes only its
own `Fetch.requestPaused` listener on clear, fails blocked requests &
continues unexpected non-blocked paused requests
- `packages/core/tests/integration/context-domain-policy.spec.ts`
validates blocked requests fail on an existing page & on a page created
after the policy is set
- `packages/core/tests/unit/public-api/public-error-types.test.ts`
validates `StagehandSetDomainPolicyError` is exported as a public error
type




<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a context-wide domain blocklist that intercepts and blocks outgoing
HTTP(S) requests by domain using CDP `Fetch`. Includes a small API,
strict validation, and clearer error messages.

- **New Features**
- API: `context.setDomainPolicy(policy | null)` and
`context.getDomainPolicy()`.
- Patterns: exact hosts and leading wildcards only (`example.com`,
`*.example.com`); HTTP/HTTPS on any port; case-insensitive; trailing
dots handled.
- Scope: applies to existing and new pages/targets; clearing with `null`
or `[]` disables and removes our handler on success.
- Validation: domain-only strings; invalid inputs throw
`StagehandInvalidArgumentError`.
  - Behavior: non-matches continue; matches fail with `BlockedByClient`.
- Errors: if `Fetch.enable` fails, we uninstall the handler for that
session, close new targets, and `newPage()` fails fast with
`StagehandSetDomainPolicyError` that includes per-session details and
CDP error text; if `Fetch.disable` fails, the handler stays installed
and the same error is thrown.

<sup>Written for commit 54c22df.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2274?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
Prepare the next browse release by versioning the package on `main`.

What this PR does:
- bumps `packages/cli/package.json` to `0.9.1`
- updates the browse changelog
- consumes the pending browse changesets

After this PR merges, the `Release` workflow on `main` will publish
`browse@0.9.1` from that exact commit using `pnpm pack` + `npm publish
--provenance`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Prepare `browse@0.9.1` by bumping the CLI version and updating the
changelog; merging will trigger the Release workflow to publish.

- **New Features**
- Attribute CLI-driven Browserbase usage to an anonymous install.
Sessions stamp `install_id` and `cli_version` in `userMetadata`. Cloud
Search/Fetch send `x-bb-client` and `x-bb-install-id`. Best-effort and
non-blocking.

<sup>Written for commit f2eca53.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2285?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ons (#2282)

## Summary

Adds `--verified` and `--proxies` to remote driver sessions so a
Verified and/or proxied Browserbase session opens in **one command**:

```bash
browse open <url> --remote --verified --proxies
```

Before this, the `browse` **driver** (`open`) had no way to request
Verified/proxies — only `browse cloud sessions create --verified
--proxies` (session **management**) could make such a session. So
*driving* one took two steps: create it, then attach the driver via raw
`--cdp`. That raw attach loses session identity for the whole lifetime
(`browse status` reports `mode: cdp` with no Browserbase session ID;
`doctor` can't reason about it) and, bypassing the normal remote path,
never gets the `browse_cli` attribution tag — so Verified/proxied power
users were invisible to browse-CLI telemetry. (Plain `browse open
--remote` already worked in one step; it just couldn't ask for
Verified/proxies.) Closes
[STG-2265](https://linear.app/browserbase/issue/STG-2265/add-verified-proxies-to-browse-open-remote-and-attach-by-session-id)
(Tier 1).

## What changed (CLI-only)

- **New flags** `--verified` / `--proxies` on the shared driver flag set
(so `open`, `doctor`, etc. accept them). Valid **only with `--remote`**
— never implied, because implying it would silently switch the user to
billed cloud sessions. Without `--remote` they hard-error with a hint.
- **Threaded into session creation**: the remote `ConnectionTarget`
carries the settings, and `remoteStagehandOptions` maps them onto
`browserbaseSessionCreateParams` (`proxies: true`,
`browserSettings.verified: true`) while keeping
`userMetadata.browse_cli`.
- **Sticky per session** like `--headed`/`--headless`: the settings join
the mode-equality check, so a re-open requesting different settings
fails with the usual stop-and-reopen error.
- **`status` / `doctor` surface identity**: Browserbase session ID,
dashboard URL, live-view (debug) URL, and verified/proxies state — read
from the existing Stagehand getters, no extra API call.
- Bundled browse skill updated to document the one-command form.
- `--verified` requires a Browserbase Scale plan.

## Tier 2 (follow-up, intentionally not in this PR)

`browse open <url> --remote --session-id <id>` (attach-by-ID for the
create-then-attach long tail: regions, keep-alive, contexts, `--stdin`
body). Kept separate to keep this PR atomic; the skill already points
there as the bridge.

## E2E Test Matrix

Run against live Browserbase with a local build (`node bin/run.js …`).
Session IDs are ephemeral; the live-view URL is signed and redacted.

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `open <url> --verified` (no `--remote`) | `--verified require
--remote. Try: browse open <url> --remote --verified` | Guard works;
flag is never silently implied. |
| `open <url> --proxies` (no `--remote`) | `--proxies require --remote.
Try: browse open <url> --remote --proxies` | Same, for `--proxies`. Also
verified it errors even with `BROWSERBASE_API_KEY` set (no auto-implied
remote). |
| `open "https://api.ipify.org?format=json" --remote --proxies` | `{
mode: "remote", browserbaseSessionId: "c64ca54b…",
browserbaseSessionUrl: "https://www.browserbase.com/sessions/c64ca54b…",
hasDebugUrl: true }` | Session identity (id + dashboard + live-view URL)
is surfaced right on `open`, not lost like `--cdp`. |
| `eval` egress IP: proxied vs non-proxied | proxied `8.28.99.210` vs
non-proxied `44.248.86.34` → **different routes** | `--proxies` actually
routes egress through Browserbase proxies (not just accepted). |
| `status -s <proxied>` | `{ mode: "remote", browserbaseSessionId:
"c64ca54b…", target: { kind: "remote", proxies: true } }` | `status`
surfaces the session ID and proxies/verified state. |
| `doctor --json -s <proxied>` | browserbase check: `session c64ca54b… —
https://www.browserbase.com/sessions/c64ca54b…`; target: `reusing remote
(proxies)` | `doctor` reasons about the live session + settings. |
| Sticky: plain remote running, re-open `--remote --proxies` (and
`--remote --verified`) | `Session "s" is already running in remote mode.
Run browse stop --session s before changing modes.` | Settings are
sticky; conflicting re-open fails like `--headed`. |
| `open <url> --remote --verified` (live) | `{ mode: "remote",
browserbaseSessionId: "e2f0ce9f…", target: { kind: "remote", verified:
true } }` | Verified session is actually created (this key is on Scale).
|
| `pnpm test` | `Test Files 19 passed (19) · Tests 278 passed (278)` |
Unit coverage incl. new resolution/guard/sticky tests + `remote-options`
create-params threading. |
| `pnpm lint` | prettier + eslint + `tsc --noEmit` clean |
Format/lint/types green. |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Add `--verified` and `--proxies` to remote driver sessions so you can
open a Verified and/or proxied Browserbase session in one command:
`browse open <url> --remote --verified --proxies`. This preserves
session identity and makes `browse status` and `browse doctor` show the
Browserbase session ID and links. Closes STG-2265.

- **New Features**
- `--verified` and `--proxies` are valid only with `--remote`; they are
never implied. `--verified` requires a Browserbase Scale plan.
- Settings are sticky for the session; changing them requires stopping
and reopening the session.
- `status` and `doctor` now show the Browserbase session ID, dashboard
URL, live-view URL, and whether verified/proxies are enabled. `doctor`
also suggests the correct `open` command with `--verified/--proxies`
when relevant.
- Flags are threaded into Browserbase session create params while
keeping the `browse_cli` attribution tag.

- **Bug Fixes**
- The `--remote` guard message uses correct singular/plural grammar
(“requires”/“require”).

<sup>Written for commit d1806c9.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2282?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why

domain policy enforcement currently relies on `Fetch.requestPaused`, but
popups opened with `window.open()` can reach their destination before
Fetch interception is installed on the new target. in that race, the
blocked popup may successfully open, & therefore break the domain policy

# what changed

this PR adds a fallback close path for popup targets whose URL violates
the active domain policy:

- this PR adds listening for `Target.targetCreated`,
`Target.targetInfoChanged`, and attach-time target metadata for popup
targets
- if a popup reaches a blocked/disallowed domain before request
interception catches it, the popup gets closed via Target.closeTarget`

# test plan

- added unit coverage for closing popup targets that already reached a
blocked domain, including popups whose opener target is not locally
tracked
- added unit coverage for duplicate and late target events so a
successfully closed popup is not closed/logged repeatedly
- added unit coverage for the attach race where a `targetCreated` close
is still in flight when `attached` handling runs; attach now continues
if the close fails
- added unit coverage for close failures, including treating `No target
with given id found` as a successful already-closed outcome for this
domain-policy fallback only
- added integration coverage using the existing external popup fixture
to verify a `window.open()` popup that reaches `news.ycombinator.com` is
closed and not retained in `context.pages()`

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes a race where `window.open()` popups could reach blocked domains
before interception by closing them immediately when their URL violates
the active domain policy. Prevents blocked popups from appearing or
lingering in `context.pages()`.

- **Bug Fixes**
- Listen to `Target.targetCreated`, `Target.targetInfoChanged`, and at
attach-time; close popup via `Target.closeTarget` if its URL is
disallowed.
- Deduplicate close attempts across events and let attach wait for an
in-flight close; continue attach if the close fails.
- Treat “No target with given id found” as a successful already-closed
outcome for this fallback; improve logging with rule reason and source.
- Skip non-popup targets and persist successful close dedupe across late
events.

- **Dependencies**
  - Add changeset to publish a patch for `@browserbasehq/stagehand`.

<sup>Written for commit a0dae3b.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2294?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
…unning daemon (#2280)

## TL;DR

`BROWSERBASE_API_KEY=xxx browse open <url> --remote` was **silently
ignored** when the driver daemon was already running, so the CLI kept
printing `Missing BROWSERBASE_API_KEY`. The daemon froze a copy of
`process.env` at spawn time and never saw the late key. Fix: the client
now **forwards the key with every command**, and the daemon **threads it
straight into the Stagehand session at init** — no restart, no `browse
stop`, warm sessions untouched.

Closes STG-2407. Reported via the partner AX update (partner-2027dev,
Jun 22); confirmed still live on `main`.

---

## Symptom

```console
$ browse open https://example.com --remote          # no key set → starts the daemon, fails
Missing BROWSERBASE_API_KEY

$ BROWSERBASE_API_KEY=bb_live_… browse open https://example.com --remote   # key set, SAME daemon
Missing BROWSERBASE_API_KEY                          # ❌ ignored — only `browse stop` + retry worked
```

This burns 3–4 retries per session and blocks the documented
recover-after-interruption flow.

## Root cause

The CLI is a thin client that talks to a long-lived background
**daemon** (which holds the warm browser session). The key never reached
that daemon:

1. **Frozen env.** The daemon is spawned `detached` with `env:
process.env` captured **once** at spawn time (`daemon/client.ts`). A key
exported/inlined in a *later* shell never propagates to it.
2. **Key read daemon-side.** The remote session is created inside the
daemon, which read `process.env.BROWSERBASE_API_KEY` *there*
(`remote.ts` → `session-manager.ts`).
3. **Protocol carried no credentials.** Requests had no way to deliver a
key to an already-running daemon (`daemon/protocol.ts`). Since
`--remote` is explicit, the client never blocked on the key — so it
happily started a doomed key-less daemon.
4. **Stale backoff.** A cached init-failure backoff (5s→60s) replayed
the *old* "missing key" error even on an immediate retry.

## The fix

**Make the client the source of truth for the key; never trust the
daemon's frozen env.**

```
 client (fresh env each call)                      daemon (long-lived)
 ────────────────────────────                      ───────────────────
 collectForwardedEnv()       ──{API_KEY}──▶        stash on session manager
   reads caller's env, over the owner-only socket    │
                                                     ▼  at init only:
                                                   remoteStagehandOptions(forwardedEnv)
                                                     └▶ new Stagehand({ apiKey })
                                                          (key's only home = live session;
                                                           never written back to process.env)
```

- **`daemon/forwarded-env.ts`** (new) — `collectForwardedEnv()` reads
the caller's env; `forwardedEnvSignature()` is a secret-free `sha256`
fingerprint used only to detect key changes.
- **`protocol.ts` / `client.ts`** — every `open`/`command` request
carries the caller's `forwardedEnv`.
- **`server.ts` / `session-manager.ts`** — the daemon threads the
forwarded key **into the Stagehand constructor at init**. It is **never
written into the daemon's `process.env`**. When the fingerprint changes
on a *cold* session, the stale init backoff is cleared so the retry runs
immediately.
- **Warm sessions are untouched** — an already-initialized session
returns early and keeps its browser; the key only matters at init.

> **Only the API key is forwarded.** The Browserbase backend infers the
project from the key, so `BROWSERBASE_PROJECT_ID` isn't needed for
session creation — verified end-to-end with no project id set anywhere.
(A multi-project key pinning a non-default project via
`BROWSERBASE_PROJECT_ID` is a rare edge; that still resolves from the
daemon's own env, exactly as before.)

### Naming + drift guard (per review)

The mechanism is named generically as **forwarded env vars**, not
"credentials" (`ForwardedEnv`, `collectForwardedEnv` /
`applyForwardedEnv` / `forwardedEnvKeys`, request field `forwardedEnv`,
module `daemon/forwarded-env.ts`). It stays a **curated allowlist**
(today just `BROWSERBASE_API_KEY`) rather than the whole `process.env`,
because: (1) `Object.assign`-ing the full env into an already-running
daemon silently no-ops for anything its modules read at import; (2) the
caller's env also holds the daemon's own operational vars
(`PATH`/`HOME`/`BROWSE_DAEMON_DIR`), so forwarding it wholesale risks
clobbering the daemon and can't represent an unset; (3) the driver path
has no AI ops, so no model keys are ever read — the API key is the only
env-delivered session input. A new test,
`tests/daemon-forwarded-env-drift.test.ts`, fails if any new daemon-path
`process.env` read is left uncategorized (forward vs daemon-local), so
the allowlist can't silently drift.

<details>
<summary>Why thread into the constructor instead of writing
<code>process.env</code>?</summary>

The key is read exactly once per session (at init); afterward the live
`Stagehand` instance holds it. Writing it into the daemon's global env
would leave a stray secret with no reader. Threading keeps the
credential scoped to the session, and the change-detector is hashed so
the raw key isn't kept in a second field. There's no perf cost —
forwarding ~100 bytes is free next to cloud session creation; the only
thing cached for speed is the warm session, which is independent of the
key value.

**Rejected alternatives:** writing the key into the daemon's env (stray
secret, one-way env accumulation); auto-restarting the daemon (kills
warm sessions, racy); a mere actionable error (the ask is for it to
*work*, not just guide).
</details>

## Security contract (local-only build)

`BROWSERBASE_API_KEY` must not appear in the CDP-only artifact. The
forwardable-key *list* is capability-gated: `forwardedEnvKeys()` returns
the key in the full build (`remote.ts`) and `[]` in
`remote.disabled.ts`. `collectForwardedEnv` / `forwardedEnvSignature`
iterate the received object's own keys, so they stay key-name-free. Net:
the literal lives only in `dist/lib/driver/remote.js`, and
`tests/local-only-build.test.ts` still passes.

## Testing

Local full build (`pnpm build`) of the code under review, real
Browserbase key, exercising the exact repro against an already-running
key-less daemon.

| Step | Command / flow | Result | Proves |
| --- | --- | --- | --- |
| 1 | Key-less `open … --remote` (spawns daemon) | `Missing
BROWSERBASE_API_KEY` (daemon stays up) | Reproduces the stranded
key-less daemon |
| 2 | Inline `BROWSERBASE_API_KEY=… open … --remote`, **same** daemon,
**no project id anywhere** | ✅ `SUCCESS` — `"title": "Example Domain"`,
`"url": "https://example.com/"` | The inline key reaches the running
daemon (the fix); project inferred from the key |
| 3 | Warm reuse: `open https://www.iana.org --remote` | ✅ `SUCCESS` —
`"Internet Assigned Numbers Authority"`, same `targetId` | Warm-session
fast path preserved |
| 4 | `vitest` driver-foundation + remote-disabled + local-only + drift
| ✅ 45 pass (4 files) | Unit coverage; asserts the key is **not**
written to `process.env` |
| 5 | `tsc -p tsconfig.local-only.json` + local-only-build test | ✅
typecheck clean | Security contract held (no key name in CDP-only build)
|
| 6 | `pnpm lint` | ✅ format + eslint + tsc clean | No regressions |
| 7 | Drift guard: inject an uncategorized `process.env.X` on the daemon
path | ✅ test goes red with "forward vs daemon-local" guidance, green
after revert | The guard is non-vacuous — a new daemon-path env read
can't slip through |

## Follow-ups (not in this PR)

- **`browse open` ECONNREFUSED** (also AX-flagged): `sendDriverRequest`
(`client.ts`) has no connect-retry, so a transient
`ECONNREFUSED`/`ENOENT` (stale socket / daemon mid-shutdown) propagates
raw. Couldn't reproduce under load — tracking separately rather than
shipping unverified.
- **Fail-fast `--remote` guard** (defense in depth): make explicit
`--remote` resolve the key client-side like `autoSelectRemoteTarget`
already does, so a key-less first call fails fast instead of spawning a
doomed daemon. Forwarding alone fixes the reported bug; the guard would
just improve the first-call error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

Removes the `browse refs` command. It only re-printed the
`xpathMap`/`urlMap` cached from the last `browse snapshot` — which
`browse snapshot` already returns — so it was redundant. It was also a
footgun: it returned **stale** maps if the page had changed since that
snapshot.

`refs` was introduced in the CLI's oclif rewrite as one of the driver
commands and never pruned; nothing relies on it that `browse snapshot`
doesn't already cover.

## What's removed

- The `browse refs` command (`src/commands/refs.ts`)
- Its driver handler + the `"refs"` entry in `DriverCommandName`
- The now-unused `getRefMaps()` accessor on the session manager
- `browse refs` references in `README.md` and the browse `SKILL.md`

Ref-based commands (`click`, `fill`, `select`, …) are **unaffected** —
they resolve from the cached maps via `resolveSelector`, which is
untouched. `browse snapshot` continues to return the ref maps by
default.

## E2E Test Matrix

| Command / flow | Observed output | Confidence |
| --- | --- | --- |
| `browse refs` | `"browse refs" is not a browse command … Error:
command refs not found` | Command removed |
| `browse --help` | no `refs` entry | Removed from the surface |
| `browse snapshot` | unchanged (still returns `{ tree, urlMap, xpathMap
}`) | Snapshot behavior untouched |
| `driver-commands` unit tests | 14/14 pass | No test regressions |
| `pnpm --dir packages/cli build` (`tsc`) | success | Typechecks (incl.
narrowed `DriverCommandName`) |

Linear: [STG-2453](https://linear.app/browserbase/issue/STG-2453)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2298)

thanks @yawbtng for the contribution here!!

## why

CUA `keypress` actions describe a single key **chord** (modifiers held
down while the main key is pressed), but
`V3CuaAgentHandler.executeAction` pressed each key in the array
**separately**. `page.keyPress(modifier)` presses and *releases* the
modifier, so by the time the main key was pressed the modifier was
already up.

The concrete failure: a `["Control", "A"]` keypress sends `Control` on
its own (a no-op) and then `A` through the plain typing path — so
instead of select-all, the agent **types a literal `a` into the focused
field**. Any select-all / copy / paste / cut / shortcut pattern silently
fails *and* corrupts input. Because the agent-replay cache recorded the
broken per-key sequence, replays reproduced the bug too.

This is provider-dependent, based on the shape each client emits:

| Provider | emits for a combo | old behavior | status | | --- | --- |
--- | --- |
| OpenAI | `keys: ["CTRL", "A"]` | `Ctrl` then literal `a` | ❌ broken |
| Google (`key_combination`) | `.split("+")` → `["Control", "A"]` |
`Ctrl` then literal `a` | ❌ broken |
| Microsoft (`fara-7b`) | `keys: string[]` (per-key) | `Ctrl` then
literal `a` | ❌ broken |
| Anthropic | `keys: ["ctrl+s"]` (single `+`-joined string) | chorded
correctly | ✅ unaffected |

Anthropic only worked by accident — it pre-joins with `+`, which
`page.keyPress` already chords internally.

## what changed

`packages/core/lib/v3/handlers/v3CuaAgentHandler.ts` — in the `keypress`
case, map each key and **join into one `+`-delimited combination**, then
call `page.keyPress` once. `page.keyPress` already holds modifiers down
for the final key and already special-cases the literal `+` key, so
single keys, already-combined strings, and `Ctrl++`-style inputs all
stay correct. `mapKeyToPlaywright` is idempotent (`CTRL`/`Control` →
`Control`), so Google's pre-mapped arrays and Anthropic's combined
string are unchanged. The recorded replay step is now a single `press
Control+A` instead of the broken `press Control, press A`.

## test plan

New `packages/core/tests/unit/cua-keypress-chord.test.ts` (5 cases, all
passing):
- `["Control", "A"]` → single `keyPress("Control+A")`
- alias normalization: `["CTRL", "A"]` → `keyPress("Control+A")`
- single key `["Enter"]` → `keyPress("Enter")` (unchanged)
- already-combined `["ctrl+s"]` → `keyPress("ctrl+s")` (Anthropic shape,
unchanged)
- empty `[]` → no `keyPress` call

Existing CUA suites (`anthropic-cua-triple-click`, `openai-cua-client`,
`microsoft-cua-client`, `anthropic-cua-adaptive-thinking`) — 25 tests
still green.

---

Related: this is exactly the class of provider-specific CUA regression
that #2188 proposes catching with a deterministic bench task.



<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes CUA keypress combos by pressing them as one chord. Shortcuts like
Ctrl+A now work across OpenAI, Google `key_combination`, and Microsoft
clients instead of typing letters.

- **Bug Fixes**
- Map keys, join with "+", and call `page.keyPress` once; supports
arrays, already-joined strings, and the literal "+" key.
- Normalize aliases (`CTRL` → `Control`) and record a single `press
Control+A` step for replays.
- Added unit tests for combos, alias normalization, single key,
already-combined, and empty input.

<sup>Written for commit c966475.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2298?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: yawbtng <154343001+yawbtng@users.noreply.github.com>
…rors (#2269)

# why

Edit: pulling in the description from @filip-michalsky on
#2270

Root-cause fix for STG-2335: a non-CUA agent.execute() that successful
run as { success: false }, with a red Invalidprompt: messages must be a
ModelMessage[] error logged after all the work already completed.
This replaces the symptom patch (wrap the finalization in try/catch and
force state.completed = true) with a fix for the actual defect.
Root cause
After the main agent loop finishes, ensureDone() runs a forced "done"
finalization (handleDoneToolCall) that re-submits the accumulated run
history into a fresh generateText call to produce the structured
su-validates accumulated tool results, but this re-submissiondoes.
When a custom tool returns an object with an optional field left
undefined — e.g. PermitFlow's captureField returning { matchedExpected:
undefined }when no expectedText is passed — that undefined lands insid
The AI SDK's ModelMessage validation (standardizePrompt)rejects it,
because its JSON-value schema disallows undefined (only
null/string/number/boolean/object/array). The finalization throws,
flipping the result to { success: false } even though every action succe

▎ Note: the original "reasoning traces" hypothesis was rule parts come
back with a valid text: "" and pass validation.The undefined tool-result
field is the trigger.

# what changed

sanitizeMessagesForResubmission() deep-strips undefined from the run
history before the forced "done" call, keeping all real content. It only
traverses plain objects/arrays, so class instances (URL, tyata, Date)
pass through untouched.

# test plan

- 4 unit tests in agent-finalization-resilience.test.ts agaces
InvalidPromptError with an undefined tool-result field →fixed by
sanitize → real content (reasoning/tool-call/text) preserved → class
instances untouched. All pass.
- End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main
(success=false, red error), succeeds on this branch(success=true,
completed=true, no error).


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Prevents non-CUA `agent.execute()` from reporting a completed run as
failed by sanitizing the run history before the forced "done" call and
making finalization best-effort. Fixes STG-2335 for reasoning models
like `openai/gpt-5.x` by stripping nested `undefined` values that break
SDK prompt validation.

- **Bug Fixes**
- Deep-strip `undefined` from re-submitted messages via
`sanitizeMessagesForResubmission`; traverse only plain objects/arrays to
preserve real content and class instances; apply in `handleDoneToolCall`
and null-guard `result.toolCalls`.
- If the forced "done" call throws, log a warning and synthesize
completion from the run instead of failing it.
- Add unit tests for the InvalidPromptError repro (including
`providerOptions` in `gpt-5.x`), sanitizer behavior, class-instance
pass-through, and the finalization-failure fallback.

<sup>Written for commit 4d1c904.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2269?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Filip Michalsky <31483888+filip-michalsky@users.noreply.github.com>
Co-authored-by: Filip Michalsky <filip-michalsky@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## What & why

Browserbase contexts are identified only by an opaque ID and the
platform has **no server-side list endpoint**, so reusing a context
today means remembering (or copy-pasting) a UUID. This is also the
single most-used feature of the popular community Browserbase skill on
ClawHub
([jamesfincher/browserbase](https://clawhub.ai/jamesfincher/skills/browserbase)),
which keeps a local map of named contexts.

This PR ports that ergonomic into the CLI as a thin, **client-side**
name→ID map — no API change.

- `browse cloud contexts create --name github` → creates the context and
saves a local alias
- `browse cloud contexts list` → shows your saved names (new command;
the API has no list, so this reflects names saved on this device)
- `contexts get|update|delete` and `sessions create --context-id` now
accept a **saved name or a raw ID** (a resolver passes unknown refs
through unchanged, so raw IDs still work everywhere)
- `contexts delete` prunes the local alias for the deleted context

The map lives at
`(XDG_CONFIG_HOME||~/.config)/browserbase/contexts.json` (honoring
`BROWSERBASE_CONFIG_DIR`), next to the CLI's existing state via the
shared `resolveConfigDir()` helper. The file is written `0600`. It
stores only the same IDs the API already returns, and a missing or
corrupt file degrades to "no saved contexts" rather than erroring.

Linear:
[STG-2422](https://linear.app/browserbase/issue/STG-2422/named-contexts-for-the-cli-local-nameid-map)

## E2E Test Matrix

Run against **live Browserbase** with the local build (`node
bin/run.js`), `BROWSERBASE_CONFIG_DIR` pointed at a throwaway dir.
Signed URLs redacted.

| Command / flow | Observed output | Confidence / sufficiency |
| --- | --- | --- |
| `contexts create --name e2e-smoke` |
`{"id":"45ed525f-…","uploadUrl":"<redacted>",…,"name":"e2e-smoke"}` |
Proves a real context is created and the name is echoed back. |
| `cat contexts.json` |
`{"version":1,"contexts":{"e2e-smoke":{"id":"45ed525f-…","createdAt":"2026-06-27T00:40:51Z"}}}`
| Proves the local name→ID map is persisted with the API-returned ID. |
| `contexts get e2e-smoke` (by name) |
`{"id":"45ed525f-…","projectId":"2d228d57-…",…}` | Proves name→ID
resolution on `get` hits the real `/v1/contexts/<id>`. |
| `contexts list --format table` | `Name ID Created`<br>`prod-login
d98a30da 2026-06-27 00:41Z` | Proves the new list command renders the
saved map. |
| `sessions create --context-id prod-login --persist` (by name) |
session `27f9087a-…` returned with `contextId == d98a30da-…` →
**`MATCHES name→id: True`** | Key flow: a real session attaches to the
right context purely from the **name**. |
| `contexts delete prod-login` (by name) |
`{"ok":true,"id":"d98a30da-…","removedAliases":["prod-login"]}` then
`contexts list` → `No saved contexts.` | Proves API delete + local alias
prune. |
| `contexts delete <raw-uuid>` | `{"ok":true,"id":"45ed525f-…"}` (no
`removedAliases`) | Proves raw IDs still work and orphan cleanup; no
alias pruned when none matched. |
| `pnpm test:cli` | `Test Files 21 passed (21) · Tests 310 passed (310)`
| Full suite green, incl. new `contexts-store` unit tests +
`contexts-named` CLI-level e2e (fake server) + updated surface test. |
| `pnpm lint` | format + eslint + `tsc --noEmit` all clean | Types,
style, lint. |

## Notes

- Bump: `browse: patch` (matches how the CLI bumps; `browse` is in the
changeset `ignore` list but still gets release-impacting patches by
convention).
- No new dependencies. Pure client-side; composes with the existing
config-dir convention.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Add named contexts to the CLI so you can reuse a Browserbase context by
name instead of copying IDs. Implements Linear STG-2422 with a local
name→ID map, typo hints, raw‑ID compatibility, and a new `contexts add`
command; no API changes.

- **New Features**
- `browse cloud contexts create --name <name>` saves a local alias and
returns the name; `browse cloud contexts add <name> <id>` names an
existing context (trims the ID, rejects empty; use `--force` to
overwrite).
- `browse cloud contexts list` shows saved names on this device
(`--json` returns `{ "contexts": [...] }`).
- `browse cloud contexts get|update|delete` and `browse cloud sessions
create --context-id` accept a saved name or a raw ID. Unknown names
close to a saved one fail early with a “did you mean?” hint; unknown
refs otherwise pass through so non-UUID IDs still work. `contexts
delete` prunes aliases for the deleted ID and includes `removedAliases`
(best effort).

- **Notes**
- Aliases live at
`(XDG_CONFIG_HOME||~/.config)/browserbase/contexts.json` (honors
`BROWSERBASE_CONFIG_DIR`), written 0600 via atomic write;
missing/corrupt files behave as empty. The map is prototype-safe,
rejects UUID-shaped names, and sanitizes malformed entries on read;
names must start alphanumeric, allow letters/digits/._-, max 64, and
duplicates are blocked unless `--force`.

<sup>Written for commit 00fdd03.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2284?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`.

What this PR does:
- bumps `packages/cli/package.json` to `0.9.2`
- updates the browse changelog
- consumes the pending browse changesets

After this PR merges, the `Release` workflow on `main` will publish
`browse@0.9.2` from that exact commit using `pnpm pack` + `npm publish
--provenance`.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why
- node 22.17.0 introduced a compatibility issue with node-fetch which
broke the browserbase sdk when running on 22.17.0
- addresses #2291 
# what changed
- bumped the browserbase sdk version to ^2.14.0 which fixes the issue
# test plan
- tested locally by running node 22.17.0, observing the issue. the issue
is resolved after bumping the browserbase sdk version

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Upgrade `@browserbasehq/sdk` to ^2.14.0 to restore compatibility with
Node 22.17.0 by resolving the `node-fetch` issue. Fixes #2291.

- **Bug Fixes**
- Bumped `@browserbasehq/sdk` to ^2.14.0 in `packages/cli`,
`packages/core`, and `packages/server-v3`; updated `pnpm-lock.yaml`.
  - Prevents runtime failures when running on Node 22.17.0.

<sup>Written for commit 78ffe34.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2307?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
…ge (#2310)

## What

Changeset-only patch bump (`browse` 0.9.2 → 0.9.3) to trigger the
**first browse release that publishes the Docker image** added in #2295.

`browse@0.9.2` shipped *before* the GHCR publish step existed, so no
image is on the registry yet. The Docker step only runs when browse's
version changes (`should_publish`), so a release is required to produce
it.

This PR contains **only** a changeset — no code changes.

## Release path (after this merges)

1. Merge this PR → the `browse` changeset lands on `main`.
2. Run the **Prepare CLI Release** workflow → it opens a `Release
browse@0.9.3` PR.
3. Merge that PR → the **Release** workflow publishes `browse@0.9.3` to
npm **and** builds/pushes `ghcr.io/browserbase/browse` (multi-arch,
pinned).

## One-time follow-up

GHCR packages default to private — after the first push, set
`ghcr.io/browserbase/browse` visibility to **Public** so sandboxes can
pull it anonymously.

---
Linear:
[STG-2468](https://linear.app/browserbase/issue/STG-2468/release-browse-093-to-publish-the-docker-image-ghcriobrowserbasebrowse)


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Patch bump for `browse` 0.9.3 to trigger the first Docker image publish
to `ghcr.io/browserbase/browse` (multi-arch, pinned per release). No
code changes; adds a changeset so the release builds and pushes the
image. Addresses Linear STG-2468.

- **Migration**
- After the first publish, set `ghcr.io/browserbase/browse` visibility
to Public so sandboxes can pull without auth.

<sup>Written for commit 09b8f9d.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2310?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`.

What this PR does:
- bumps `packages/cli/package.json` to `0.9.3`
- updates the browse changelog
- consumes the pending browse changesets

After this PR merges, the `Release` workflow on `main` will publish
`browse@0.9.3` from that exact commit using `pnpm pack` + `npm publish
--provenance`.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Prepare the `browse@0.9.3` release by bumping the package version,
updating the changelog, and consuming the pending changeset. On merge,
the Release workflow on `main` will publish via `pnpm pack` + `npm
publish --provenance`; this release also notes the new Docker image at
`ghcr.io/browserbase/browse`.

<sup>Written for commit 32f3a51.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2311?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.