Sync#2
Open
metehanozdev wants to merge 956 commits into
Open
Conversation
PSA potential hackers: dont get excited, we don't have any real secrets in CI worth stealing, and our CI does not autodeploy anything to prod. All important secrets and CD processes are kept in our closed-source repos. # why # what changed # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Add a gating workflow that blocks CI until a maintainer approves running secrets on forked PRs. CI now triggers from that gate, resolves labels and path filters under workflow_run, removes same-repo guards so integration/e2e/evals run on approved forks, and checks out the PR commit consistently across jobs. <sup>Written for commit c682847. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1782">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
…ed" (#1786) Reverts #1782 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Reverts the approval-based CI for external contributors. CI now runs on pull_request and blocks secrets for forked PRs by skipping integration, E2E, and eval jobs. - **Refactors** - Removed the “Ensure Contributor Is Trusted to Run CI” workflow. - Switched CI trigger to pull_request; removed workflow_run logic. - Read labels from github.event.pull_request; removed API calls. - Simplified checkouts; dropped explicit head_sha refs. - Updated concurrency group to use github.ref. - Ignored docs-only changes in CI. <sup>Written for commit d6ace82. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1786">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
Reverts #1780 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Reverts the change that skipped CI on forked PRs. Integration tests, evals, and the Stainless preview now run for all PRs by removing the head-repo equality checks in ci.yml and stainless.yml. <sup>Written for commit 18480e8. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1787">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why
cdpHeaders is already plumbed through packages/server correctly, it was
just missing from the spec.
- packages/core/lib/v3/types/public/api.ts:15 defines cdpHeaders on
LocalBrowserLaunchOptionsSchema.
- packages/server/src/routes/v1/sessions/start.ts:192 forwards
browser.launchOptions with a spread into localBrowserLaunchOptions, so
cdpHeaders is preserved.
- packages/server/src/lib/InMemorySessionStore.ts:240 passes
localBrowserLaunchOptions straight into new V3(...).
- packages/core/lib/v3/v3.ts:750 passes lbo.cdpHeaders into
V3Context.create(...).
- packages/core/lib/v3/understudy/context.ts:167 finally uses it in
CdpConnection.connect(wsUrl, { headers: opts?.cdpHeaders }).
# what changed
# test plan
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added the missing `cdpHeaders` field to the v3 server OpenAPI spec so
clients can send custom Chrome DevTools Protocol headers. This aligns
the spec with server launch options and prevents client
codegen/validation errors.
<sup>Written for commit 39ee737.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1797">Review in
cubic</a></sup>
<!-- End of auto-generated description by cubic. -->
…and server-v4 dirs (#1796) # Follow-up Tasks - [ ] Update stainless SDK custom code for all languages to pull new `stagehand-server-v3-darwin-x64` binary names (`-v3-` added) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Split the Stagehand API into `packages/server-v3` and `packages/server-v4`, each with its own builds, tests, SEA binaries, and release workflows. Delivers STG-1536 and lets us keep v3 stable while iterating on v4; CI/test discovery and OpenAPI artifacts are versioned. - **Refactors** - Renamed the original server to `packages/server-v3` (`@browserbasehq/stagehand-server-v3`); updated docs and runtime path helpers (now synced across core/docs/evals and both servers), ESLint globs/ignores, scripts/Turbo filters, tests, and Stainless to read `packages/server-v3/openapi.v3.yaml`; v3 SEA binaries use `stagehand-server-v3-*`. - Added `packages/server-v4` (`@browserbasehq/stagehand-server-v4`) with `/v4/**` routes, SSE streaming via `x-stream-response`, LRU/TTL in-memory session store, health/readiness, logging/metrics, `openapi.v4.yaml` + generator, SEA tooling, and v4 integration tests. - CI: path filters, test discovery, and artifacts cover both versions; added `stagehand-server-v4-release.yml` and `stagehand-server-v4-sea-build.yml`; renamed v3 workflows; artifacts include `packages/server-v3/**` and `packages/server-v4/**` dists and OAS. - **Migration** - Replace `packages/server/**` refs with `packages/server-v3/**` or `packages/server-v4/**`. - Use new package filters and binary names: `@browserbasehq/stagehand-server-v3` / `@browserbasehq/stagehand-server-v4`; `stagehand-server-v3-*` / `stagehand-server-v4-*`. - Update OpenAPI consumers to `packages/server-v3/openapi.v3.yaml` or `packages/server-v4/openapi.v4.yaml`. <sup>Written for commit 2b9114c. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1796">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
## Summary - Adds the `@browserbasehq/browse-cli` package (`packages/cli`) to the stagehand monorepo, open-sourcing browser automation for AI agents - CLI provides stateful browser control via a daemon architecture — navigation, clicking, typing, screenshots, accessibility snapshots, multi-tab, network capture, and env switching (local/remote) - Uses `@browserbasehq/stagehand` as a workspace dependency (bundled into the CLI binary via tsup) - Includes full test suite and documentation ## Changes - `packages/cli/` — all CLI source code, config, tests, and docs - `pnpm-workspace.yaml` — added `packages/cli` to workspace - `.github/workflows/ci.yml` — added CLI path filters and build artifact uploads - `.changeset/open-source-browse-cli.md` — changeset for initial release - `pnpm-lock.yaml` — updated lockfile ## Test plan - [x] CLI builds successfully (`pnpm --filter @browserbasehq/browse-cli run build`) - [x] Full monorepo build passes (`turbo run build` — 9/9 tasks) - [x] `browse --help` and `browse --version` output correctly - [x] `browse status` returns valid JSON - [x] Lint passes clean (`pnpm --filter @browserbasehq/browse-cli run lint`) - [x] Source verified identical to stagent-cli (only import path changed) - [x] Empirically tested Browserbase credential requirements match core - [ ] Run `pnpm --filter @browserbasehq/browse-cli run test` (requires Chrome/browser environment) ## Known issues (pre-existing from stagent-cli, not introduced by this PR) - Network capture `response.json` always writes `status: 0` — response metadata from `responseReceived` CDP event is not persisted to `loadingFinished` handler - Ref-based `click` command silently ignores `--button`/`--count`/`--force` flags (coordinate-based `click_xy` handles them correctly) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g CI (#1801) # why # what changed # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Corrects the changeset package reference from `@browserbasehq/stagehand-server` to `@browserbasehq/stagehand-server-v3` to unblock CI and ensure the correct package receives the patch release. <sup>Written for commit 177bc48. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1801">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
## Summary - `browse env` showed stale "local" mode after `browse env remote` - Root cause: `.mode` file was only written during lazy browser init (`ensureBrowserInitialized`), not at daemon startup. Between daemon start and first command, `readCurrentMode()` returned `null` and fell back to hardcoded `"local"` - Write `.mode` eagerly in `runDaemon()` at startup so it's immediately available - Fall back to `desiredMode` instead of `"local"` in the `env` display handler as a safety net ## Test plan - [x] Reproduced bug: `browse env remote` → `browse env` showed `"mode":"local"` - [x] Verified fix: `browse env remote` → `browse env` now shows `"mode":"remote"` - [x] `mode.test.ts` passes (3/3) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes `browse env` showing stale "local" after `browse env remote` (STG-1547). The daemon now writes `.mode` at startup, the display falls back to `desiredMode` until mode is written, and a patch changeset is added for `@browserbasehq/browse-cli`. <sup>Written for commit 9661d92. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1806">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Stacked on #1800 - Only `BROWSERBASE_API_KEY` is required for remote mode in the CLI - `BROWSERBASE_PROJECT_ID` is still passed through if set, but no longer checked ## Changes - `packages/cli/src/index.ts` — `hasBrowserbaseCredentials()` only checks for API key - `packages/cli/tests/mode.test.ts` — Updated test to match new error message - `packages/cli/README.md` — Updated docs to reflect optional project ID ## Test plan - [x] Existing mode test updated - [x] Manual: `browse env remote` with only `BROWSERBASE_API_KEY` set 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Make `BROWSERBASE_PROJECT_ID` optional in the CLI for remote mode, so only `BROWSERBASE_API_KEY` is required. The project ID is still forwarded when provided. - **Bug Fixes** - Updated remote mode check and error message to only require `BROWSERBASE_API_KEY`. - Autodetection now defaults to `remote` when the API key is set; otherwise `local`. - Updated tests and `@browserbasehq/browse-cli` README to match. <sup>Written for commit 99eb186. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1803">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r PRs to run CI with secrets (#1794) # why - External contributor PRs currently fail CI because they cant run with secrets - We dont want to allow them to run with secrets until a team member "claims" them and reviews for any secrets exfiltration / sketchy code - Once claimed, we want to run the full CI suite with secrets # what changed # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds two GitHub Actions that let maintainers claim external contributor PRs by mirroring the approved head SHA to a maintainer-owned branch so full CI can run with secrets. Claims come from an approving review by a team member with write access on the latest commit and are auto-invalidated on new commits (Linear STG-1518). - **New Features** - Detects forked PRs and posts claim instructions; manages labels: `external-contributor`, `external-contributor:awaiting-approval`, `external-contributor:mirrored`, `external-contributor:stale`, `external-contributor:completed`. - On approving review of the latest commit, verifies reviewer permission, mirrors that exact SHA to `external-contributor-pr-<PR#>-<12sha>`, and creates/reopens a “[Claimed #X]” PR assigned to the approver. - Closes and links the original PR with marker comments; keeps labels/status in sync on both PRs. - Auto-closes the mirror when new commits land on the external PR and comments with next steps; if the mirror closes without merge, reopens and relabels the original PR; if the external PR is reopened with the same approved SHA while the mirror is open, it is closed again to keep discussion on the mirror. - Implemented via `external-contributor-pr-approval-handoff.yml` (captures approved reviews, uploads artifact) and `external-contributor-pr.yml` (consumes artifact, performs mirroring); uses `actions/github-script@v7`, `actions/create-github-app-token@v1`, `actions/checkout@v4`, `actions/download-artifact@v4`, `actions/upload-artifact@v4`; concurrency scoped per PR/workflow run. - **Migration** - Create a GitHub App with `contents:write`, `pull_requests:write`, and `issues:write`; add `EXTERNAL_CONTRIBUTOR_PR_APP_ID` and `EXTERNAL_CONTRIBUTOR_PR_APP_PRIVATE_KEY` secrets. - To claim: submit an approving review on the latest commit of a forked PR. If new commits are pushed, approve again to re-claim and rerun CI. <sup>Written for commit 4875e99. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1794">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why bug in previous approach # what changed # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes the external PR approval flow by switching to the correct `GITHUB_TOKEN`, stabilizing the mirror/refresh behavior, and ignoring third‑party bot comments when parsing claim markers. Also improves the `claude` workflow to build the repo before edits and allow rerunning failed jobs. - **Bug Fixes** - Use `GITHUB_TOKEN` for branch pushes and API calls; remove the GitHub App token path. - Enable `persist-credentials: true` during checkout to allow pushes. - Keep the mirrored PR open and mark it stale when new commits land on the external PR; relabel both PRs consistently. - Auto-handle reopen/close transitions across external and mirrored PRs. - Ignore comments from non-managed bots (e.g., Greptile, Cubic); only parse claim markers from `github-actions[bot]` to avoid false triggers. - **Refactors** - Inline a small JS lib (`ECPR_LIB`) to manage labels, comments, lifecycle, and claims; jobs run in clear phases (external lifecycle → claim prep → branch refresh → claim finalize). - Refresh internal branches by rebasing onto the approved external ref; report conflicts cleanly for manual follow-up. - Improve `claude.yml`: upgrade to `actions/checkout@v6`, set `actions: write`, run `pnpm`/`turbo` build via `setup-node-pnpm-turbo`, enable `track_progress`, and use an explicit tool allowlist for `anthropics/claude-code-action@v1`. <sup>Written for commit a46b159. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1812">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# Why
OpenAI organizations with Zero Data Retention (ZDR) rejects stored
responses from the Responses API (`store: true` is the default when the
AI SDK auto selects it). This causes agent runs to fail
# What Changed
- Set `openai: { store: false }` in `providerOptions` across
`generateText` / `streamText` calls: `v3AgentHandler.ts` (execute +
stream), `handleDoneToolCall.ts`,
- Simplified the existing Gemini `providerOptions` — removed the
conditional `modelId.includes("gemini-3")` check and always pass
`google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` since non-Google
providers ignore it.
# Test Plan
- [ ] Run agent in mode with an OpenAI model to confirm no breaking
changes
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Defaulted agent calls to OpenAI to not store responses, preventing
failures for Zero Data Retention orgs. Also simplified Gemini options by
always sending high media resolution.
- **Bug Fixes**
- Set `providerOptions.openai.store` to `false` for agent `generateText`
and `streamText` calls in `v3AgentHandler` (execute + stream) and
`handleDoneToolCall`, avoiding Responses API rejections in ZDR orgs.
- **Refactors**
- Always pass `google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` in
`providerOptions`; non-Google providers ignore it. Added a changeset for
a patch release of `@browserbasehq/stagehand`.
<sup>Written for commit a01d8c0.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1814">Review in
cubic</a></sup>
<!-- End of auto-generated description by cubic. -->
## Summary - Adds `--context-id <id>` and `--persist` flags to `browse open` so agents can load/persist browser state (cookies, localStorage, etc.) across Browserbase sessions using Contexts - Validates edge cases: `--persist` requires `--context-id`, `--context-id` requires remote mode, context change triggers daemon restart ## Usage ```bash # Load a context (read-only — state not saved back) browse open https://app.com --context-id ctx_abc123 # Load and persist changes back on session end browse open https://app.com --context-id ctx_abc123 --persist ``` ## How it works 1. `browse open --context-id` writes context config to `/tmp/browse-{session}.context` 2. The daemon reads this file during browser initialization and passes it through as `browserbaseSessionCreateParams.browserSettings.context` 3. If a second `browse open` is called with a different context ID, the daemon is restarted (context is baked into the BB session at creation time) Context config uses a temp file (same pattern as `.mode`) because it's needed at Browserbase session creation time, before the daemon's command socket is up. ## Test plan - [x] `browse open https://example.com --context-id <known-id> --persist` on remote mode — verify session created with context in BB dashboard - [x] `browse stop` then reopen with same context — verify state persists - [x] Verify context mismatch triggers daemon restart (open with context A, then open with context B) - [x] Same context, second open — verify no unnecessary restart - [x] `browse open https://example.com --context-id <id>` on local mode — verify clear error - [x] `browse open https://example.com --persist` without `--context-id` — verify clear error - [x] Plain `browse open` (no context flags) — verify no regression - [x] `cleanupStaleFiles` removes `.context` file on shutdown - [x] Stale `.context` file from crashed daemon is cleared on next `browse open` without `--context-id` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# why when running pnpm format, it formats files that are not relevant to current changes which is annoying # what changed formatted the unformatted files in cli package # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Standardized Prettier/ESLint formatting in `packages/cli` so `pnpm format` runs are stable and don’t touch unrelated files. No functional changes. - **Refactors** - Applied Prettier across `packages/cli/src` and tests (line breaks, parens, quotes). - Tidied lint/Prettier config formatting (`eslint.config.mjs`, `.prettierrc` newline). - Adjusted test imports and one assertion to match formatter. <sup>Written for commit 31570db. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1819">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why Allow users to pass custom headers in their LLM calls # what changed Add headers to the model.ts types # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds `headers` support to `ClientOptions` so clients can send custom HTTP headers with every provider request. Useful for auth tokens or routing hints without changing global config. - **New Features** - Added `headers?: Record<string, string>` to `ClientOptions` in `packages/core/lib/v3/types/public/model.ts`; headers are sent with each request. - No breaking changes; default behavior is unchanged. <sup>Written for commit 424dc1a. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1817">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why Sync the Stagehand MCP docs with the Browserbase MCP docs for STG-1576. # what changed Copied the refreshed Browserbase MCP introduction and setup pages into `packages/docs/v3/integrations/mcp`. # test plan `pnpm exec prettier --check packages/docs/docs.json packages/docs/v3/integrations/mcp/introduction.mdx packages/docs/v3/integrations/mcp/setup.mdx`; `pnpm --dir packages/docs exec mint broken-links` (unrelated existing failures only); `pnpm lint` fails in `packages/core` on an existing ESLint rule config issue. --------- Co-authored-by: ci-test <ci-test@example.com>
…bservability (#1824) Refactored flow logging to an event-based system with `FlowLogger` and a pluggable `eventStore`, improving LLM/CDP traces, action and screenshot events, and concise prompt/response summaries. `V3` now carries a `sessionId` and `flowLoggerContext`. Split the server into `server-v3` and `server-v4` with separate OpenAPI, routes, and SEA builds, and updated CI. - **New Features** - Added `eventStore` with `FileEventStore`, queries, aggregate metrics, and bus attachment; exported `getEventStore`, `setEventStore`, `destroyEventStore`, and `getFlowLogConfigDir`. - Replaced `SessionFileLogger` with `FlowLogger` across agents and handlers; added `wrapWithLogging` for `Page` actions; standardized event names via `toTitleCase`. - Switched to concise LLM summary helpers (`extractLlmPromptSummary`, `extractLlmCuaPromptSummary`, `extractLlmCuaResponseSummary`) and `FlowLogger.createLlmLoggingMiddleware`. - `V3Options.sessionId` now used to associate flows; CDP calls are linked to flow events for better correlation. - Exported `FlowLogger`, `FlowEvent`, and `toTitleCase` from `@core/v3/index`. - **Migration** - Replace `SessionFileLogger` and `@logAction` with `FlowLogger` methods (`log*`, `wrapWithLogging`). - Use `setEventStore/getEventStore` to plug custom storage (`FileEventStore` by default); optionally pass `sessionId` in `V3Options`. - Update paths/scripts from `packages/server` to `packages/server-v3`; use new binaries `stagehand-server-v3-*`/`stagehand-server-v4-*`. <sup>Written for commit c35fdbd. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1824">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
# Why
Add Browserbase Search API as the primary search tool for agents (`POST
/v1/search`), leveraging the Browserbase API key users already have
configured. Brave Search remains fully supported as a backwards
compatible fallback for existing users with `BRAVE_API_KEY` set.
# What Changed
- **Browserbase Search tool**: New search tool powered by the
Browserbase Search API, enabled via `useSearch: true` in
`agent.execute()`.
- **Brave Search (backwards-compatible)**: Existing users with
`BRAVE_API_KEY` in their environment continue to get the search tool
automatically with no code changes required.
- **Priority**: `useSearch: true` with a Browserbase API key takes
precedence. If not set, falls back to Brave if `BRAVE_API_KEY` is
present.
- **Validation**: If `useSearch: true` is set without a valid
Browserbase API key, a clear error is thrown at preparation time (before
the agent loop starts).
- **Separate files**: `browserbaseSearch.ts` and `braveSearch.ts` keep
the implementations cleanly isolated.
```typescript
const agent = stagehand.agent({ mode: "hybrid" });
const result = await agent.execute({
instruction: "use the search tool to find me the capital of france",
useSearch: true,
});
```
# Test Plan
- [x] Run agent with `useSearch: true` and a valid `BROWSERBASE_API_KEY`
— Browserbase search tool should return results
- [x] Run agent with `useSearch: true` and no Browserbase API key —
should throw `MissingEnvironmentVariableError`
…re for Observability" (#1837) Reverts #1824 See #1836 for context <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Reverts the new flow-logging schema and event store, restoring the previous session file–based logger. Updates agents, handlers, and evals to use `SessionFileLogger` and removes the `sessionId` option from `V3Options`. This undoes the STG-1566 observability changes to stabilize logging. - **Refactors** - Remove `packages/core/lib/v3/eventStore.ts` and related wiring. - Restore `SessionFileLogger` in `flowLogger.ts` with helpers like `formatCuaPromptPreview`, `formatLlmPromptPreview`, `logAction`, and `logStagehandStep`. - Replace `FlowLogger` usage with `SessionFileLogger` across agents/handlers; update LLM clients (`OpenAICUAClient`, `AnthropicCUAClient`, `GoogleCUAClient`, `AISdkClient`) to log requests/responses with operation names. - Simplify Understudy/CDP instrumentation; remove FlowLogger context in `cdp.ts`; use `logAction` in `page.ts`. - Remove `toTitleCase` and log explicit action names (e.g., `Understudy.<method>`, `Page.close`). - Evals: switch to event-driven screenshot collection via the V3 event bus; drop interval polling. - Remove `sessionId` handling and related code in `v3.ts`. - **Migration** - Remove `sessionId` from `V3Options` if you were passing it. - Any direct uses of the reverted `eventStore` or the new observability schema are no longer available. <sup>Written for commit f4b687f. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1837">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why - there is a vulnerability in `"langchain/core"` for versions lower than `0.3.80` (more info [here](https://www.cve.org/CVERecord?id=CVE-2025-68665)) - we have this package as an optional dep # what changed - bumps the optional `"langchain/core"` dep to `^0.3.80` <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Bumps optional `@langchain/core` to ^0.3.80 to resolve a vulnerability in earlier versions. Lockfile refreshed; no runtime changes. <sup>Written for commit 5e5d812. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1841">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why This is the real, recovered FlowLogger PR + improvements based on yesterday's review session. I accidentally force-pushed an earlier, incomplete branch state onto the original flow logger PR and dropped several days of work. This PR replaces that broken state with the actual recovered branch and the follow-up fixes needed to make the design correct again. <img width="1569" height="1015" alt="image" src="https://github.com/user-attachments/assets/f99171fe-27df-488c-a87a-75605c2a3c1b" /> ## what this PR does This restores the intended v3 FlowLogger/EventStore design: - every `V3` instance owns exactly one FlowLogger session - every `V3` instance owns exactly one `EventStore` - FlowLogger produces structured `FlowEvent`s on the instance bus - EventStore consumes those events and fans them out to sinks - the default query history is shallow and bounded so logging does not grow memory without limit The goal is a stable, queryable execution tree for Stagehand/agent/CDP/LLM work that preserves parent/child relationships, survives ALS re-entry, and does not let sink failures break main execution. ## architecture ### FlowLogger `packages/core/lib/v3/flowlogger/FlowLogger.ts` FlowLogger owns: - the `FlowEvent` model - the `FlowLoggerContext` stored in AsyncLocalStorage - method/closure wrappers: `wrapWithLogging(...)`, `runWithLogging(...)` - re-entry helpers: `withContext(...)`, `resolveContext(...)` - CDP event helpers - LLM request/response helpers and middleware FlowLogger is the event producer. It is responsible for building the execution tree and maintaining the active parent stack. It is not responsible for persistence or output destinations. ### EventStore `packages/core/lib/v3/flowlogger/EventStore.ts` EventStore owns: - per-session sink registration - query routing - file-backed session directory setup - bounded in-memory ancestry retention - store lifecycle and teardown Sink implementations live in `packages/core/lib/v3/flowlogger/EventSink.ts`, and prettifying/sanitization lives in `packages/core/lib/v3/flowlogger/prettify.ts`. EventStore is the consumer/router layer. It is intentionally per-instance and single-session. ### V3 wiring `packages/core/lib/v3/v3.ts` In the `V3` constructor: - `this.eventStore = new EventStore(this.sessionId, opts)` - `this.flowLoggerContext = FlowLogger.init(this.sessionId, this.bus)` - `this.bus.on("*", this.eventStore.emit)` So FlowLogger always has a per-instance context, and the bus is the handoff point between FlowLogger and EventStore. ## lifecycle ### FlowLogger lifecycle 1. `V3` creates the per-instance `FlowLoggerContext` 2. instrumented methods enter through `wrapWithLogging(...)` or `runWithLogging(...)` 3. a started event is emitted 4. that event is pushed onto `parentEvents` 5. nested work inherits that parent stack through ALS 6. completed/error events are emitted on unwind 7. `V3.close()` clears the in-memory flow context ### EventStore lifecycle 1. `V3` creates one `EventStore(sessionId, opts)` 2. the default shallow in-memory query sink is attached immediately 3. optional sinks are attached based on runtime config 4. `V3` forwards wildcard bus events into the store 5. `destroy()` tears down the configured sinks ## config / runtime options ### always-on pieces Every `V3` instance always gets: - FlowLogger context - shared event bus - one EventStore - one default shallow in-memory query sink So instrumentation exists even if no human-visible sink is active. ### stderr sink Enabled automatically when: - `verbose: 2` - `BROWSERBASE_FLOW_LOGS=1` Behavior: - writes prettified logs to stderr - suppresses CDP events to keep interactive output high-signal - does not retain history - is best-effort only ### file sinks Enabled automatically when: - `BROWSERBASE_CONFIG_DIR` is set Behavior: - `JsonlFileEventSink` writes full events to `session_events.jsonl` - `PrettyLogFileEventSink` writes prettified lines to `session_events.log` - `session.json` is written with sanitized options - `sessions/latest` is maintained best-effort - file sinks are write-only; `query()` returns `[]` ### queryable in-memory history Default behavior: - EventStore attaches a `ShallowInMemoryEventSink` - retention limit is `500` events per session - retained events keep ancestry metadata only - retained `data` is intentionally stripped in the default sink That shallow history is what pretty formatting uses to recover recent ancestry without retaining screenshots/base64 payloads forever. ## memory impact The main memory constraint is the default query sink. What it retains: - up to 500 recent shallow events per `V3` instance - ids, timestamps, event types, parent ids, session id What it does not retain by default: - full screenshots - large base64 blobs - full historical payloads Implications: - memory scales roughly with `number_of_sessions * 500 shallow events` - stderr sink does not retain history - file sinks do not materially retain history in process memory beyond normal stream buffering - attaching `InMemoryEventSink` explicitly is an opt-in to full-payload retention ## usage patterns ### use `wrapWithLogging(...)` For class methods that should emit their own started/completed/error envelope: - `V3.act` - `V3.extract` - `V3.observe` - page/understudy methods - agent execute entrypoints ### use `runWithLogging(...)` For closures or non-decorator code paths that still need a lifecycle envelope. ### use `withContext(...)` / `resolveContext(...)` For callback-driven or later async work: - websocket callbacks - CDP response/message dispatch - detached async continuations `currentContext` is strict and may throw when ALS is missing. `resolveContext(...)` is the non-throwing lookup for “active ALS if present, otherwise instance-owned fallback”. ### LLM logging is best-effort If no flow context is active, LLM logging should no-op. Missing context must not break model execution. ## reviewer expectations / edge cases Please review this PR with these expectations in mind: - ALS can legitimately be missing in later callbacks or detached async work; that should not break execution unless a caller explicitly uses strict `currentContext` - there should be no global or multi-session fallback store; logging belongs to a real `V3` instance/session - parent-chain correctness matters more than sink behavior; events must land under the right parent chain - sink failures should not break the main library; sinks are intentionally best-effort - completion/error pretty lines should resolve back to the original started-event ancestry, not synthetic completion ids - `sessionId` often matches `browserbaseSessionId` today, but code should not rely on that as a permanent invariant - this PR is about FlowLogger/EventStore architecture and correctness, not a model-default migration --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
# why Nick's PR had both some server-v3 changes and server-v4 changes. I split it into two prs - just the v3 changes here, and just the v4 changes [here](#1840) (WIP). Then, once I rebased this PR, it's really just one small change to the node SEA binary stuff. # test plan Verified the split with exact file manifests before creating the branch and ran `pnpm install --lockfile-only --ignore-scripts`. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Renamed the Stagehand server to `packages/server-v3` and `@browserbasehq/stagehand-server-v3`. Updated CI, release, tests, and SEA build logic; no API or runtime changes. - **Refactors** - Moved `packages/server` to `packages/server-v3` and renamed the package to `@browserbasehq/stagehand-server-v3`. - Updated GitHub workflows (CI, SEA build, release) and artifacts to `stagehand-server-v3-*`. - Switched OpenAPI/Stainless references to `packages/server-v3/openapi.v3.yaml`. - Updated test discovery/commands and Turbo tasks to target `@browserbasehq/stagehand-server-v3`; adjusted ESLint, workspace, and scripts accordingly. - Hardened SEA build: verify Node binary includes the required fuse, fall back to the official Node distro when needed, enforce fuse presence, use `stagehand-server-v3-sea` temp paths, centralize the fuse value, and add a clear cache recovery hint when the cached Node binary lacks the fuse. - **Migration** - Use `@browserbasehq/stagehand-server-v3` in `pnpm`/Turbo filters and scripts. - Run local tasks from `packages/server-v3`. - For SEA builds/tests, use binaries named `stagehand-server-v3-<platform>-<arch>` and set `SEA_BINARY_NAME` if needed. <sup>Written for commit 645a2e6. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1839">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
…1846) # why Monarch Money reported a bug in the Python SDK (stagehand-python v3.6.0). In src/stagehand/_streaming.py, both the sync Stream.__stream__() (line 62-63) and async AsyncStream.__stream__() (line 147-148) have: ``` python if sse.data.startswith('{"data":{"status":"finished"'): break # ← drops the event without yielding it ``` When the server sends the finished SSE event (which contains success, message, actions, usage, and messages), the SDK immediately breaks out of the loop without yielding the event to the caller. This means the final result payload is silently dropped. # what changed The streaming on_event config had `handle: done` for the `{"data":{"status":"finished"...}}` event. Stainless generates this as a bare break, which exits the SSE loop without yielding the event – silently dropping the final result payload (success status, message, actions, usage, and messages) from every streaming response. Fix – Changed to `handle: yield` so the finished event is delivered to callers before the stream terminates. This is safe because the server explicitly calls `reply.raw.end()` immediately after sending the finished event (packages/server-v3/src/lib/stream.ts), so the SSE connection closes right after the yield and the loop exits naturally on EOF – no hang risk. # test plan Regenerate SDKs and confirm fix
# why The intent is we just want the v4 stubs - and the openapi spec (and a few useless tests for now). This isn't all of of the v4 stubs yet (missing `/logs` etc), but we'll add in the rest of the stubs in parallel. # what changed - Removes all "copied over" v3 *functionality* and /sessions routes etc. - Adds v4 stubs like `/browsersession` which return simple JSON objects # test plan Lint + (pretty useless) tests pass, checked /documentation accessible locally and included screenshot. <img width="1663" height="971" alt="Screenshot 2026-03-18 at 10 36 59 AM" src="https://github.com/user-attachments/assets/ede65965-f0b6-452e-a841-59ad25c2d456" /> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds `@browserbasehq/stagehand-server-v4`, a schema‑first, stubbed v4 API for BrowserSession and Page, with a regenerated OpenAPI v4 spec. Replaces the legacy v4 sessions runtime with simple stubs and a minimal server boot, leaving `packages/server` unchanged. - **New Features** - Full v4 routes: `/v4/browsersession` (create/get/end + actions) and `/v4/page` (navigation, frames, input, CDP, screenshots, etc.), plus `/healthz` and `/readyz`. - Zod v4 schemas for BrowserSession/Page expose OpenAPI components; generator builds from route registries (`browserSessionRoutes`, `pageRoutes`); `openapi.v4.yaml` updated. - Integration tests for v4 BrowserSession and Page; test utils updated for v4 shapes and a default `x-model-api-key`. - **Refactors** - Removed legacy v4 runtime and infra (old `sessions/*` routes, session store, SSE, auth/env/logging/response, CORS/metrics, SEA fuse) and legacy tests; server registers only v4 routes/components. - Simplified health/readiness handlers and error handling with a lightweight `AppError` in `src/types/error.ts`. - Fixed query parameter handling in stub GET handlers and OpenAPI so GET endpoints use the correct querystring shapes. <sup>Written for commit 8dede23. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1840">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. -->
# why Browserbase can solve captchas asynchronously, but agents were still trying to interact with the page while the solver was active. That led to CUA and DOM/hybrid flows clicking solved captcha widgets again, pausing on confirmation questions, or resuming with stale assumptions instead of continuing the original task cleanly. This PR pauses agent execution while Browserbase's captcha solver is active and hardens the post-solve resume path so the agent keeps working on the original task after Browserbase finishes. # what changed - added a shared `CaptchaSolver` utility that listens for Browserbase `browserbase-solving-started/finished/errored` console events, supports concurrent waiters, and disposes listeners cleanly - paused DOM/hybrid `prepareStep` execution and CUA `prepareStep` / action execution while Browserbase is solving a captcha - enabled Browserbase captcha solving by default unless `browserSettings.solveCaptchas: false` - updated agent prompts and follow-up messages to tell the model that captchas are handled automatically and should not be clicked again after they are solved - added OpenAI CUA recovery behavior so it can: - carry one-shot context notes into the next model turn - auto-continue when the model asks for confirmation instead of acting - guard post-solve clicks that target the solved captcha widget and restate the original instruction so the model re-anchors on the task - added focused unit coverage for solver state, Browserbase session accessors, CUA/regular agent hooks, and OpenAI CUA confirmation handling # test plan - `pnpm --filter @browserbasehq/stagehand lint` - `pnpm --filter @browserbasehq/stagehand build:esm` - `cd packages/core && pnpm exec vitest run --config vitest.esm.config.mjs dist/esm/tests/unit/openai-cua-client.test.js dist/esm/tests/unit/captcha-solver.test.js dist/esm/tests/unit/agent-captcha-hooks.test.js dist/esm/tests/unit/browserbase-session-accessors.test.js` Browserbase smoke: - OpenAI CUA + reCAPTCHA demo: strict pass on `Verification Success` - Anthropic CUA + reCAPTCHA demo: strict pass on `Verification Success` - Hybrid Gemini + reCAPTCHA demo: strict pass on `Verification Success` - OpenAI CUA + `solveCaptchas: false`: solver stays disabled, no wait/resume path is triggered, and the agent stops at the captcha instead of bypassing it --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Shrey Pandya <shrey@browserbase.com>
…2249) ## Summary Linear: https://linear.app/browserbase/issue/STG-2278/add-did-you-mean-suggestions-and-telemetry-for-unknown-browse-commands Adds a `command_not_found` oclif hook to the browse CLI that prints a did-you-mean suggestion for unknown commands and emits a new `cli.command_not_found` telemetry event, while preserving oclif's standard "command not found" error and exit code 2. ## Impact if merged Unknown commands (`browse sessions` / `search` / `contexts` / `auth status` — the old Commander-era syntax that agents were trained on, plus plain typos) currently exit 2 with no suggestion and emit NO telemetry event, so this failure class is invisible by construction. Old-binary telemetry shows the pattern is real (1,310 commander-error events from `sessions.list` alone in 30d; 115 from `search`). It's a failed-first-command class, and a failed first command cuts 7-day retention 12.4x (0.42% vs 5.21%). Did-you-mean turns an agent guess-loop into a one-turn recovery, and the new `cli.command_not_found` event finally lets us size and rank the dead ends. No new dependency — deliberately avoids `@oclif/plugin-not-found`, which prompts interactively (agent-hostile). ## Implementation notes - **New hook** `src/hooks/command-not-found.ts`, registered in the `oclif.hooks` config. Suggestion order: explicit alias table first (old-CLI syntax → current tree, e.g. `sessions` → `cloud sessions list`, `auth status` → `doctor`, `search` → `cloud search`), then nearest match by Levenshtein over `config.commandIDs` with a distance threshold (the did-you-mean clause is omitted when nothing decent matches). Alias targets are validated against the live command tree at runtime and against `oclif.manifest.json` in tests, so they can't silently drift. - **Privacy: id + suggestion only, never argv.** oclif's spaced-topic parsing glues unknown leading argv tokens into the attempted id (e.g. `browse opne https://example.com` arrives as `opne:https://example.com`), so the hook sanitizes down to leading command-shaped tokens and reports only the matched prefix (or the first token when nothing matches). The telemetry payload carries exactly `attempted_command` and `suggested_command` — URLs, selectors, queries, and secrets never leave the machine. Covered by a dedicated test asserting argv values are absent from captured payloads. - **Exit semantics preserved.** A `command_not_found` hook that returns normally makes oclif treat the invocation as handled (exit 0), silently swallowing the failure. The hook therefore re-throws oclif's standard `CLIError("command <id> not found")` after printing the suggestion, keeping stderr output and exit code 2 byte-identical to current behavior. - **Telemetry can't hang or get lost.** The event reuses the existing PostHog transport (400ms abort timeout, best-effort catch) and is awaited inside the hook before the error is thrown, so it is delivered before process exit but cannot delay it beyond the transport timeout. The `finally`-hook completion path early-returns for unknown commands (prerun never fires), so there is no double counting. - No new runtime dependency; ~140 LOC of source plus tests. ## E2E Test Matrix | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `node bin/run.js sessions` (local build) | stderr: `"browse sessions" is not a browse command. Did you mean "browse cloud sessions list"? Run browse --help for all commands.` then `Error: command sessions not found`; `echo $?` → `2` | Proves alias suggestion + preserved exit code on the highest-volume old-syntax pattern | | `node bin/run.js auth status` | stderr: `"browse auth status" is not a browse command. Did you mean "browse doctor"? ...`; exit `2` | Proves multi-token alias matching (`auth:status` → `doctor`) | | `node bin/run.js search "test"` | stderr: `"browse search" is not a browse command. Did you mean "browse cloud search"? ...`; exit `2`; the query token is not shown as part of the attempted command | Proves alias prefix matching strips trailing user args from messaging | | `node bin/run.js opne https://example.com` | stderr: `"browse opne" is not a browse command. Did you mean "browse open"? ...`; exit `2` | Proves Levenshtein typo fallback; URL excluded from the attempted command | | `node bin/run.js open https://example.com --local` | JSON result `{"mode": "managed-local", ..., "title": "Example Domain", "url": "https://example.com/"}`; exit `0`; no suggestion output | Proves valid commands are completely unaffected (hook never fires) | | Live telemetry capture: `BROWSERBASE_TELEMETRY_HOST=<local capture server>` + `node bin/run.js auth status` | Capture server logged `POST /i/v0/e/` with `"event": "cli.command_not_found"`, `"attempted_command": "auth.status"`, `"suggested_command": "doctor"` plus standard env/version props; payload received before CLI exit `2`; no argv content in payload | Proves the event actually sends, flushes before process exit, and carries only id + suggestion | | `pnpm test` (builds then vitest) | `Test Files 16 passed (16), Tests 229 passed (229)` — includes 13 new unit/integration tests (alias table validity vs manifest, Levenshtein, thresholds, token sanitization, built-CLI suggestion/exit-code/telemetry/privacy) | Full regression sweep; existing telemetry suite still green | | `pnpm lint` | prettier + eslint + `tsc --noEmit` all pass | Supporting evidence only | 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds did-you-mean suggestions and privacy-safe telemetry for unknown `browse` CLI commands, while keeping the standard error output and exit code 2. Addresses Linear STG-2278 by helping users recover from old syntax and typos; typo matching now uses `fastest-levenshtein`. - **New Features** - Added a `command_not_found` hook that prints a suggestion using an alias table for old syntax, with segment-aligned Levenshtein fallback for typos; omitted when no good match. - Sends `cli.command_not_found` telemetry with strict privacy: only the sanitized attempted command id and the suggested command, never raw argv. - Preserves default behavior by rethrowing the not-found error (stderr unchanged, exit code 2) and avoids `@oclif/plugin-not-found`. - Removed misleading `auth`/`login` → `doctor` suggestions. <sup>Written for commit bcee6ed. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2249?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…-failure backoff (#2248) ## Summary Makes browse driver (browser session) failures actionable, classified, and self-correcting. Today an invalid `BROWSERBASE_API_KEY` surfaces a bare `Error: 401 Unauthorized` with no remediation, the 5s init-failure cache makes every retry instant and identical, and most driver failures reach telemetry as `unexpected`. Linear: [STG-2277](https://linear.app/browserbase/issue/STG-2277/make-browse-driver-errors-actionable-with-result-codes-and-init) ## Impact if merged This targets the browse CLI's largest failure mode by volume and by user pain. 71 installs stuck in get/screenshot retry loops generate 92.3% of ALL CLI telemetry (~5.5M events/30d); ~375k of those events come from tagged claude-code and codex agents on current versions — exactly the target ICP (coding agents driving browsers). Root cause (smoke-tested): any `BROWSERBASE_API_KEY` forces remote mode; an invalid key surfaces a bare `Error: 401 Unauthorized` with no remediation, and the 5s init-failure cache makes every retry instant, so agents can't self-correct and loop forever. Separately, 2,337 distinct users hit missing_api_key/auth_401 in 30d, and `open` — whose failures are 94% unclassifiable today (result_code `unexpected`) — gates activation: only 28.5% of real users reach an activated session, and a failed first command cuts 7-day retention 12.4x. This PR makes auth/driver failures actionable (agents recover in one turn) and classified (we can finally measure why open fails). ## Implementation notes - **Remote init classification** (`remote.ts`): new `classifyRemoteInitError()` duck-types the SDK error's `status` — 401 → `remote_auth_401` (invalid-key message with settings link, `--local`, `browse doctor`), 403 → `remote_auth_403` (permissions/plan wording, same escape hatches), other → `remote_session_create_failed` (original message preserved + `browse doctor` pointer). Wired through the `RemoteCapability` interface so local-only builds compile. - **Chrome-not-found** (`session-manager.ts`): chrome-launcher's `ERR_LAUNCHER_NOT_INSTALLED` / `ERR_LAUNCHER_PATH_NOT_SET` failures in managed-local mode get install/`--cdp`/remote guidance instead of leaking launcher internals. - **Init-failure backoff**: cached init failures now back off exponentially — `min(5s * 2^(n-1), 5min)` — reset on success and `close()`. After ≥3 consecutive failures the cached message gains a `(failing repeatedly — fix BROWSERBASE_API_KEY, use --local, or run browse doctor)` suffix (deduped on rethrow). - **Result codes over the daemon protocol**: `ErrorResponseSchema` gains optional `code`/`httpStatus` (backward compatible — old daemons omit them); the daemon's `formatError` surfaces them from typed `DriverError`s; the client rethrows as `CommandFailure` with `resultCode`/`httpStatus` so the existing #2210 telemetry plumbing records them. Client-side fail sites tagged: `daemon_lock_timeout`, `daemon_unresponsive`, `daemon_socket_timeout`, `daemon_spawn_failed`. Already-authored driver errors tagged: `stale_ref` (unknown ref), `no_active_page`. - **Local-only build contract preserved**: remediation strings that mention `BROWSERBASE_API_KEY` live behind the remote capability (`driverInitHints()`), so the `build:local-only` artifact stays key-free (guarded by the existing `local-only-build.test.ts`, which caught the first draft). ## E2E Test Matrix | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `BROWSERBASE_API_KEY=bb_invalid_test <local build> get url` | `Browserbase rejected your BROWSERBASE_API_KEY (401 Unauthorized). A set key makes browse default to remote mode. Check the key at https://browserbase.com/settings, run without one using --local (browse open <url> --local), or diagnose with browse doctor.` exit=1 | Proves the new 401 classification flows daemon → protocol → client → stderr end-to-end against the real Browserbase API. | | Same command 4x rapidly (cached failure window) | Identical actionable message each time, ~400ms per run (no remote round-trip) | Proves cached failures keep the actionable message and stay instant; does not by itself prove backoff growth. | | Same command after 6s, then after 11s more (real failures #2, #3) | Message gains ` (failing repeatedly — fix BROWSERBASE_API_KEY, use --local, or run browse doctor)` suffix, exactly once, exit=1 | Proves the ≥3-consecutive-failures hint and suffix dedupe on the live failure path. | | Valid key: `open https://example.com` → `get title` → `stop` | `"mode": "remote" ... "title": "Example Domain"`, then `{"title": "Example Domain"}`, then `{"stopped": true}` | Proves the remote happy path is unchanged (no regression in outputs or exit codes). | | `env -u BROWSERBASE_API_KEY <local build> open https://example.com --local` → `get url` | `"mode": "managed-local" ... "url": "https://example.com/"`, then `{"url": "https://example.com/"}` exit=0 | Proves keyless managed-local mode is unaffected. | | `get text @9-99` on the local session | `Unknown ref "9-99" - run browse snapshot first to populate refs (have 0 refs).` exit=1 | Proves the stale-ref message is unchanged while now carrying `stale_ref` through the protocol (round-trip unit-tested). | | `browse doctor` with and without key | `Status: ok` in both; `target remote` with key, `target managed-local` without | Proves doctor behavior unchanged. | | `pnpm build` + `pnpm lint` (prettier, eslint, tsc) | All pass | Supporting only. | | `pnpm test:cli` | 16 files / 228 tests pass, incl. new `driver-errors.test.ts` (classification, backoff schedule, chrome-not-found detection, protocol round-trip, key-free local-only hints) and the `local-only-build` artifact guard | Supporting; covers mappings and the local-only security contract not exercised by live smokes. | 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Makes browse driver failures actionable and self-correcting with classified result codes and exponential init backoff. Addresses Linear STG-2277 by giving clear fixes for bad `BROWSERBASE_API_KEY`, missing Chrome/Chromium, and daemon issues, while improving telemetry. - **New Features** - Classify remote init errors into actionable messages with codes: `remote_auth_401`, `remote_auth_403`, `remote_session_create_failed` (with links to settings, `--local`, and `browse doctor`). - Add error result codes to the daemon protocol (`code`, `httpStatus`) and propagate to the client for telemetry. - Exponential backoff for cached init failures (5s doubling, capped at 1 minute) with a “failing repeatedly” hint after 3 failures. - Tag common failures with stable codes: `daemon_lock_timeout`, `daemon_unresponsive`, `daemon_socket_timeout`, `daemon_spawn_failed`, `stale_ref`, `no_active_page`, `no_chrome_found`. - Use `http-status-codes` for status mapping and extract chrome-launcher error codes to a constant (no behavior change). - **Bug Fixes** - Chrome-not-found now gives Chromium-first guidance: Linux `apt install chromium`; macOS `brew install --cask google-chrome` or set `CHROME_PATH` for Chromium, plus `--cdp` or remote as options. - Keep the local-only build key-free by moving `BROWSERBASE_API_KEY` remediation strings behind the remote capability. <sup>Written for commit b7a3f7e. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2248?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…s on every command (#2258) ## Summary In **headed managed-local** mode, the `browse` CLI stole macOS keyboard focus on **every** subcommand, making it nearly unusable next to a coding agent and impossible to parallelize. **Root cause:** the browse daemon resolves the active page on every subcommand via `ensurePage()`, which called `context.setActivePage()` unconditionally. In core, `setActivePage` ends in a CDP `Target.activateTarget` (`packages/core/lib/v3/understudy/context.ts`), and on macOS `Target.activateTarget` raises the whole Chrome app to the OS foreground — yanking focus away from your editor/terminal on each `browse navigate / snapshot / get / …`. **Fix:** route the three `ensurePage()` activation sites through a new `activateIfNeeded()` helper that only re-activates when the target page isn't already the active one. Redundant re-activation (the common single-tab case) is skipped, so focus stays put. Explicit tab switches (`tabs.ts`) still call `setActivePage` directly, so intentional foregrounding (`tab new` / `tab select`) is preserved. Scoped entirely to `packages/cli`; no core changes. ## E2E Test Matrix Run against **real headed Chrome** (managed-local) on macOS. Instrumentation (env-gated, not committed) counted `setActivePage` → `Target.activateTarget` sends across a 5-command sequence: `open` → `open` (navigate) → `snapshot` → `get url` → `screenshot`. | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | 5-command headed session, **old** unconditional behavior (env toggle) | `setActivePage (activateTarget → FOCUS STEAL)` × **5**, skips: 0 | Reproduces the bug: every command re-activates → macOS focus steal. | | 5-command headed session, **with fix** | `SKIP (no focus steal)` × **5**, activations: **0** | Proves the fix eliminates per-command focus theft for the normal single-tab flow. | | `browse open https://example.com --local --headed` then navigate to `example.org`, then `screenshot` (clean build, no instrumentation) | `"url": "https://example.org/"`, screenshot saved (`16125` bytes) | Real headed Chrome still navigates/snapshots/screenshots correctly after the change — no functional regression. | | `pnpm check` (tsc), `pnpm eslint`, `pnpm format:check` | All pass on `session-manager.ts` | Type-safe, lint-clean, formatted. | **Not changed:** explicit tab switching still activates the target tab (verified the change only touches `ensurePage()` resolution; `tabs.ts` calls `setActivePage` directly). The one-time activation from `chrome-launcher` at browser launch is unchanged (expected — opening the browser once is fine). Closes STG-2333 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Stop headed managed-local sessions from stealing macOS keyboard focus on every `browse` subcommand. We now activate a page only when it isn’t already active, addressing STG-2333. - **Bug Fixes** - Added `activateIfNeeded()` and routed three `ensurePage()` calls through it to skip redundant `Target.activateTarget`. - Kept intentional tab foregrounding (`tab new`, `tab select`) by leaving direct `setActivePage` calls. - Scoped to `packages/cli`; released as a `browse` patch via changeset; verified on real headed Chrome: 5-command flow went from 5 focus steals to 0 with no regressions. <sup>Written for commit 70b0ffb. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2258?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`. What this PR does: - bumps `packages/cli/package.json` to `0.8.5` - updates the browse changelog - consumes the pending browse changesets After this PR merges, the `Release` workflow on `main` will publish `browse@0.8.5` from that exact commit using `pnpm pack` + `npm publish --provenance`. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Release `browse@0.8.5` by bumping the package version and updating the changelog. This patch fixes focus stealing in headed local sessions, adds suggestions and telemetry for unknown commands, improves driver errors with retry backoff, adds Chrome launch-arg flags for managed-local sessions, and emits a `skill_id` on command-completed telemetry. <sup>Written for commit b405ea9. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2260?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/stagehand@3.6.0 ### Minor Changes - [#2178](#2178) [`c49a3fc`](c49a3fc) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add support for WebMCP ### Patch Changes - [#2217](#2217) [`147e310`](147e310) Thanks [@monadoid](https://github.com/monadoid)! - Add Azure OpenAI Microsoft Entra ID model auth support. - [#2231](#2231) [`cf3603d`](cf3603d) Thanks [@miguelg719](https://github.com/miguelg719)! - Add claude-fable-5 support: native structured outputs via the @ai-sdk/anthropic bump, adaptive thinking (including the new "xhigh" effort) on the agent path, the API's built-in server-side refusal fallback to claude-opus-4-8, and auto tool choice for the final done call on models that reject forced tool use. - [#2233](#2233) [`8d7d414`](8d7d414) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - Normalize URLs in `ActCache` key derivation by sorting query parameters before hashing. Semantically equivalent URLs that differ only in parameter order (e.g. `?utm_source=email&id=42` vs `?id=42&utm_source=email`) now hit the cache instead of silently missing. Fragments and duplicate keys are preserved. - [#2229](#2229) [`fd42e65`](fd42e65) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - launch local browser with --enable-features=WebMCPTesting,DevToolsWebMCPSupport by default - [#2220](#2220) [`a64c6b7`](a64c6b7) Thanks [@monadoid](https://github.com/monadoid)! - Fix Stagehand-generated shadow-root XPath resolution so deterministic actions can target elements inside web components. - [#2132](#2132) [`ed3e566`](ed3e566) Thanks [@miguelg719](https://github.com/miguelg719)! - Add canonical verifier evidence normalization for screenshots and text signals without requiring image dependencies in core installs. - [#2133](#2133) [`840aac8`](840aac8) Thanks [@miguelg719](https://github.com/miguelg719)! - Add the rubric-based verifier engine with normalized public rubric output and bounded failure-step parsing. ## @browserbasehq/stagehand-evals@2.0.3 ### Patch Changes - Updated dependencies \[[`147e310`](147e310), [`cf3603d`](cf3603d), [`8d7d414`](8d7d414), [`fd42e65`](fd42e65), [`a64c6b7`](a64c6b7), [`c49a3fc`](c49a3fc), [`ed3e566`](ed3e566), [`840aac8`](840aac8)]: - @browserbasehq/stagehand@3.6.0 ## @browserbasehq/stagehand-server-v3@3.7.1 ### Patch Changes - [#2217](#2217) [`147e310`](147e310) Thanks [@monadoid](https://github.com/monadoid)! - Add Azure OpenAI Microsoft Entra ID model auth support. - Updated dependencies \[[`147e310`](147e310), [`cf3603d`](cf3603d), [`8d7d414`](8d7d414), [`fd42e65`](fd42e65), [`a64c6b7`](a64c6b7), [`c49a3fc`](c49a3fc), [`ed3e566`](ed3e566), [`840aac8`](840aac8)]: - @browserbasehq/stagehand@3.6.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why - to add documentation for new webmcp related functions/behaviour <img width="1330" height="782" alt="Screenshot 2026-06-18 at 10 57 58 AM" src="https://github.com/user-attachments/assets/2952f136-4d01-445c-95e4-c4a81e4ec289" /> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds WebMCP docs to v3. Introduces a new Basics page and expands the `Page` API reference so users can list and invoke page-registered tools in Chrome. - **New Features** - Added a Basics page: overview; Chrome/Chromium 149+ and flags `--enable-features=WebMCPTesting,DevToolsWebMCPSupport`; how to list/invoke tools; frameId targeting; results, cancel, and timeout defaults; examples with `@browserbasehq/stagehand`. - Updated the `Page` reference with `listWebMCPTools()` and `invokeWebMCPTool()` signatures, options (`timeoutMs`, `frameId`), return shape (`result`, `cancel()`), examples, types (`WebMCPTool`, `WebMCPToolInvocationStatus`, `WebMCPToolResult`, `WebMCPToolInvocation`), and error cases (unsupported browser, ambiguous tool names, result timeouts, disposed invocations). - Included WebMCP in the docs sidebar navigation. <sup>Written for commit 43188ab. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2262?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. -->
… legacy stdout) (#2246) ## Summary Bare `browse screenshot` now writes a file by default — `screenshot-<yyyymmdd-hhmmss>.<type>` in the current directory, never overwriting (atomic collision counter) — and prints the small `{ "saved": "<path>" }` JSON. A new `--base64` flag preserves the legacy stdout contract (`{ "base64": "..." }`); it is mutually exclusive with `--path`. Explicit `--path` behavior is unchanged. Linear: [STG-2275](https://linear.app/browserbase/issue/STG-2275/default-browse-screenshot-to-file-output-with-base64-legacy-flag) ## Impact if merged screenshot is one of the two commands in the runaway agent retry loops (the loop population generates 92.3% of all CLI telemetry) and is used by 1,337 users/30d (89.9% success last-7d). Every bare invocation today prints ~22KB of base64 JSON directly into the calling agent's context window — a per-call token tax on exactly the ICP population (claude-code/codex agents driving the CLI; agent-tagged usage is 90%-successful and growing). Defaulting to a file write makes the common case agent-safe at zero cost to `--path` users; `--base64` preserves the old contract for scripts. Stdout contract change for bare invocations is called out in the changeset with a `--base64` migration note (minor bump — the bare-invocation stdout contract changes; `browse` is `<1.0.0` so this is advisory). ## Implementation notes - **Breaking change (stdout contract):** bare `browse screenshot` now prints `{ "saved": "<path>" }` instead of `{ "base64": "..." }`. **Migration:** pass `--base64` to restore the old output. - Command-layer only: the driver handler (`runtime.ts`) already supported both branches (`{ saved }` when a path is given, `{ base64 }` otherwise), so the change is confined to `src/commands/screenshot.ts`. - Default filename respects `--type` (`.jpeg` vs `.png`) and resolves against the invoking shell's cwd (absolute path passed to the driver so a daemon with a different cwd can't misplace the file). - The default filename is **atomically reserved** via exclusive create (`openSync(path, "wx")`), advancing a `-2`, `-3`, ... counter on `EEXIST` — concurrent same-second invocations can never claim the same file (addresses cubic's race-condition review). If the command fails afterward, the empty placeholder is removed best-effort. - `--base64` is mutually exclusive with `--path` via oclif's `exclusive` option. - Changeset: `browse` **minor** (per review) — the bare-invocation stdout contract change is called out in the changeset body with the `--base64` migration note. - Bundled skill doc: `skills/browse/SKILL.md` screenshot snippet updated in this PR (moved from #2245 per review) to document the new default: bare saves a file, `--path` chooses it, `--base64` is the legacy stdout form. ## E2E Test Matrix All rows ran against the local build (`pnpm build` in `packages/cli`, invoked as `node bin/run.js`) from a `<scratch dir>`. | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `node bin/run.js open https://example.com` | `{ "title": "Example Domain", "url": "https://example.com/", ... }`, exit 0 | Session setup for the runs below; proves the local build is functional end-to-end. | | `node bin/run.js screenshot` (bare) | `{ "saved": "<scratch dir>/screenshot-20260612-123732.png" }`, exit 0; `ls -la` shows the file at 15,988 bytes; stdout is the small JSON only | Proves the new default: file written to cwd with timestamped name, no base64 on stdout. | | `node bin/run.js screenshot --base64 \| head -c 200` | `{ "base64": "iVBORw0KGgoAAAANSUhEUgAABQgAAALH..."` (PNG magic in base64), exit 0 | Proves the legacy stdout contract is fully preserved behind `--base64`. | | `node bin/run.js screenshot --path /tmp/custom.png` | `{ "saved": "/tmp/custom.png" }`, exit 0; file exists at 15,238 bytes | Proves explicit `--path` behavior is unchanged. | | Two **concurrent** bare runs (same second, shell `&` + `wait`) | `{ "saved": ".../screenshot-20260612-123741.png" }` and `{ "saved": ".../screenshot-20260612-123741-2.png" }`; both files present and non-empty (4,202 bytes each) | Proves the atomic no-overwrite reservation under true same-second concurrency — the exact race cubic flagged. | | `node bin/run.js screenshot --cdp http://127.0.0.1:1` (forced failure) | `TypeError: fetch failed`, nonzero exit; directory left empty — no placeholder file | Proves failed runs do not leave empty `screenshot-*.png` placeholders behind. | | `node bin/run.js screenshot --base64 --path /tmp/x.png` | `Error: --path=/tmp/x.png cannot also be provided when using --base64`, exit 2 | Proves the oclif mutual exclusion works. | | `node bin/run.js stop` | `{ "stopped": true, "session": "default" }`, exit 0 | Clean session teardown. | | `pnpm test` (packages/cli) | `Test Files 15 passed (15), Tests 213 passed (213)` — re-run after the race fix | Supporting: no regressions in the existing CLI suite (includes the screenshot `--help` surface test). | | `pnpm lint` (packages/cli) | tsc/eslint/prettier clean | Supporting only. | 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…tages (#2250) ## Summary Linear: https://linear.app/browserbase/issue/STG-2279/fix-windows-skills-add-npx-quoting-and-bound-installer-timeouts Fixes `browse skills add` on Windows (cmd.exe spawn quoting) and bounds the two unbounded skills-installer stages (the `npx skills add` child and the catalog/file fetches). ## Impact if merged skills.add succeeds for 3.2% of Windows users (54 in the last 30d) — effectively broken for every default `C:\Program Files\nodejs` Node install, because the npx child is spawned through cmd.exe with an unquoted path. Windows devs are a smaller slice of the CLI base, but skill installers are the single highest-value cohort in our telemetry: 7x the engagement (median 12.5 vs 2 commands) and 19x the multi-day retention (28.4% vs 1.5%) of non-installers, and skills.find→add is an agent-facing funnel (51% of finders attempt an install within an hour). This also bounds the two unbounded installer stages (npx child: 180s; catalog fetches: 10s) — today they can hang forever, feeding the slow-failure retry loops that dominate telemetry volume. ## Implementation notes **Root cause.** `findExecutable` resolves `npx` via PATH+PATHEXT to `npx.cmd` on Windows, and `spawnPassthrough` spawns it with `shell: true` (required for `.cmd`/`.bat` shims). Node's `shell: true` joins command+args **unquoted** into `cmd.exe /d /s /c "..."`, so `C:\Program Files\nodejs\npx.cmd` splits at the space and cmd executes `C:\Program` → `'C:\Program' is not recognized` → exit 1. Install-path args under `C:\Users\<First Last>\...` break the same way. **Why not `shell: false`.** Spawning a `.cmd` directly with `shell: false` throws `EINVAL` on all current Node versions — the CVE-2024-27980 hardening (Node 18.20.x / 20.12.x / 21.7.x+) forbids spawning batch files without a shell because cmd.exe argument splitting cannot be made injection-safe generically. So the shell path is mandatory for `.cmd` shims, and the args must be quoted for cmd. **Quoting semantics.** `quoteForCmdShell` wraps tokens containing whitespace, quotes, or cmd metacharacters (`^ & | < >`) in double quotes, doubling embedded quotes. Node wraps the joined string in outer quotes after `/d /s /c`; with `/s`, cmd strips only those outer quotes and executes the inner, correctly-quoted line: ``` before: cmd.exe /d /s /c "C:\Program Files\nodejs\npx.cmd --yes skills add C:\Users\First Last\..." after: cmd.exe /d /s /c ""C:\Program Files\nodejs\npx.cmd" --yes skills add "C:\Users\First Last\..."" ``` **Alternative considered.** Resolving `npx-cli.js` next to `npx.cmd` and spawning `process.execPath` with `shell: false` would avoid cmd quoting entirely, but the `npx.cmd` → `npx-cli.js` relative layout differs across npm versions and Node distribution channels (nvm-windows, Volta, Scoop shims, fnm), so it trades a well-understood quoting rule for fragile path archaeology. The quoting approach is smaller and matches what cross-platform tools (e.g. `cross-spawn`) do. **Bounding the installer stages.** - `spawnPassthrough` now enforces a 180s deadline: SIGTERM, then SIGKILL after 5s if the child ignores it. A timed-out install fails with a clear message and a distinct `skill_install_timeout` result code through the existing `fail`/`resultCode` plumbing from #2210. - The catalog file-list fetch, the direct-Blob HEAD probe, and skill-file downloads now use `AbortSignal.timeout(10s)`. An aborted catalog fetch is classified exactly like a network failure (`unavailable`), preserving the existing fallback semantics. - Both deadlines are env-overridable (`BROWSE_SKILLS_INSTALL_TIMEOUT_MS`, `BROWSE_SKILLS_FETCH_TIMEOUT_MS`), following the module's existing `BROWSE_SKILLS_*` override pattern; this is also what makes the deadlines provable end-to-end in tests. ## E2E Test Matrix All commands ran against the locally built CLI (`<local build>/bin/run.js`) on macOS (darwin/arm64). | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | **Windows execution** (one-time `windows-latest` before/after run: [actions/runs/27448084696](https://github.com/browserbase/stagehand/actions/runs/27448084696)) | System Node at `C:\Program Files\nodejs` (no setup-node), `where npx` → `C:\Program Files\nodejs\npx.cmd`. **main:** `browse skills install` → exit 1, verbatim `'C:\Program' is not recognized as an internal or external command`. **PR head d2e9098:** exit 0, `Installed 1 skill`, `~\.agents\skills\browse\SKILL.md` present; win32 vitest gate (quoting + shim + spawnPassthrough timeout) 10/10 passed. | Closes the Windows gap with a real before/after on the same runner layout (win25-vs2026): the exact predicted failure reproduces on main and the PR build installs end-to-end. Full evidence in the [validation comment](#2250 (comment)). | | `browse skills find flights` (real catalog) | exit 0; returned `google.com/search-flights-ts4g1f` with full metadata | Proves catalog discovery is unaffected. | | `browse skills add google.com/search-flights-ts4g1f` (real catalog, real `npx`) | exit 0; `Downloaded 2 skill files to <config dir>`; `npx skills add` installed the skill ("Installed 1 skill ... Done!") | Proves the darwin install path (quoting branch not taken) still works end-to-end with the new deadline code in place — no regression. | | Quoting before/after for `C:\Program Files\nodejs\npx.cmd` (unit tests + helper output) | before: `C:\Program Files\nodejs\npx.cmd --yes skills add C:\Users\First Last\...` (unquoted → cmd runs `C:\Program`); after: `"C:\Program Files\nodejs\npx.cmd" --yes skills add "C:\Users\First Last\..."` | Reproduces the bug shape and asserts the exact corrected command line, incl. embedded-quote doubling, `& \| ^ < >` metachars, and the empty token. Static proof only — see Windows row. | | Hung `npx` stub (`exec /bin/sleep 600`) + `BROWSE_SKILLS_INSTALL_TIMEOUT_MS=2000` → `browse skills install` | exit 1 after 2s elapsed (timed): `Skill install timed out after 2s waiting for \`npx skills add\`...` | Proves the deadline kills a hung child and surfaces the timeout failure (`skill_install_timeout` flows through the same `fail` plumbing verified in #2210). Also covered by `spawnPassthrough` unit tests (timeout + non-timeout control). | | Hung catalog server (accepts, never responds) at **default timeouts** → `browse skills add google.com/search-flights-ts4g1f` with stubbed `npx` | exit 0 after 21s (10s API fetch abort + 10s Blob HEAD abort); npx stub invoked with `--yes skills add browserbase/browse.sh --skill google.com/search-flights-ts4g1f` | Proves a hung catalog aborts at the 10s default and the catalog-unavailable fallback semantics are preserved. Previously this hung forever. Also covered by a fast CLI-level test with `BROWSE_SKILLS_FETCH_TIMEOUT_MS=500`. | | `npx vitest run` (packages/cli) | 15 files, 224 tests passed (incl. 13 in skills-install.test.ts) | Full CLI suite green; supporting evidence only. | | `pnpm lint` (packages/cli) | exit 0 (prettier + eslint + tsc) | Supporting evidence only. | **Windows gap closed:** a one-time `windows-latest` before/after run ([actions/runs/27448084696](https://github.com/browserbase/stagehand/actions/runs/27448084696), details in the [validation comment](#2250 (comment))) reproduced the exact `'C:\Program' is not recognized` failure on main and verified `browse skills install` succeeds end-to-end on this PR's build with the default `C:\Program Files\nodejs` system Node. Full Windows vitest: 205/224 passed; all 17 failures are pre-existing POSIX test-harness assumptions (`#!/bin/sh` npx stubs etc.), identical by construction on main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes Windows failures in `browse skills add` by quoting the `npx` command when spawned via cmd.exe and bounding installer/fetch stages to prevent hangs. Also runs the full CLI test suite on Linux and Windows via a matrix job. Addresses Linear: STG-2279. - **Bug Fixes** - Quote command and args when spawning `.cmd`/`.bat` through the shell, so `C:\Program Files\nodejs\npx.cmd` and paths with spaces work. - Add a 180s deadline to `npx skills add` (SIGTERM, then SIGKILL) and a 10s abort for catalog/file fetches; both overridable via `BROWSE_SKILLS_INSTALL_TIMEOUT_MS` and `BROWSE_SKILLS_FETCH_TIMEOUT_MS`; install timeouts surface `skill_install_timeout`. - Run the full CLI suite on `ubuntu-latest` and `windows-latest` via a matrix; POSIX-only tests are guarded via `itPosix`/`describePosix` so Windows gets full coverage without brittle CI filters. <sup>Written for commit 9fe60b7. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2250?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`. What this PR does: - bumps `packages/cli/package.json` to `0.9.0` - updates the browse changelog - consumes the pending browse changesets After this PR merges, the `Release` workflow on `main` will publish `browse@0.9.0` from that exact commit using `pnpm pack` + `npm publish --provenance`. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Release `browse@0.9.0` by bumping the package and changelog; includes a new default for `browse screenshot` (saves to a file) and a Windows reliability fix for `browse skills add` with bounded timeouts. If your scripts parsed base64 from stdout, pass `--base64` to keep the old behavior. <sup>Written for commit 8178729. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2276?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…cular-structure JSON (#2278) ## What & why `stagehand.agent({ integrations })` is unusable with a local/stdio MCP server. `agent()` logs its config as auxiliary metadata via `JSON.stringify(options.integrations)`, but an MCP `Client` instance (what `connectToMCPServer({ command, args })` returns) is a **circular object** — so the call throws **`TypeError: Converting circular structure to JSON`** *before the agent ever runs*. This blocks the documented stdio-MCP path entirely (URL-string integrations are unaffected, since a string serializes fine). ## The fix `packages/core/lib/v3/v3.ts` — in the agent-creation log, serialize a **safe descriptor** instead of the raw array: keep URL strings, summarize `Client` instances as `"[mcp client]"`. One site; it runs for all agent modes (dom/hybrid/cua). No public API change. Patch changeset added. ```ts // before value: JSON.stringify(options.integrations), // after value: JSON.stringify( options.integrations.map((i) => (typeof i === "string" ? i : "[mcp client]")), ), ``` ## E2E Test Matrix | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | **Before fix** (published `3.6.0`): `agent({ integrations: [connectToMCPServer({command:"npx", args:[…filesystem MCP…]})] })` run on live Browserbase | `TypeError: Converting circular structure to JSON … at V3.agent` — crashes **before** the agent starts. Reproduced on a **second** MCP server (`@modelcontextprotocol/server-everything`) → not server-specific. | Proves the bug and that it's general to any `Client` integration. | | **After fix** (local build copied into a scratch project), same script on live Browserbase | exit 0, **no circular error**; agent connected to the MCP server (`Secure MCP Filesystem Server running on stdio`) and **invoked its tools** (got a real tool response). | Proves the fix unblocks `agent({ integrations: [client] })` end-to-end. | | `pnpm build:esm` (core) + grep built `v3.js` | build exit 0; built output contains the safe descriptor, no `JSON.stringify(options.integrations)`. | Fix compiles and is present in the artifact under test. | > Note: server-side ingestion is irrelevant here — this is a pure client-side serialization crash; the matrix exercises the exact throwing call path before and after. Closes STG-2405. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes a circular-JSON crash when creating an agent with MCP `Client` integrations (e.g., from `connectToMCPServer`). We now log a safe descriptor for `integrations`, so `agent({ integrations: [client] })` works with local/stdio MCP servers across all modes. Addresses STG-2405. <sup>Written for commit 01323a5. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2278?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why New model dropped # what changed Added support for the Gemini 3.5 Flash Computer Use updated toolset in `GoogleCUAClient.ts`, with all new tool formats correctly mapped. # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds support for the `google/gemini-3.5-flash` computer-use agent. Normalizes Gemini 3.x tool names/args to 2.5 handlers, preserves click semantics, validates coordinates (rejects missing/NaN/Infinity), always returns a fresh screenshot, and surfaces reasoning/cached tokens. - **New Features** - Enable `google/gemini-3.5-flash` in agent/LLM provider maps and public types; update tests. - Map 3.x functions to 2.5 handlers and accept new arg shapes: coordinate-less `type`, `keys` array or single `key`, `magnitude_in_pixels` for `scroll`, drag start/end pairs; recognize `screenshot`/`take_screenshot`; coordinate-less `scroll` falls back to PageUp/PageDown; alias `wait` to `wait_5_seconds`. - Always return a screenshot function response even when no executable actions are produced. - **Bug Fixes** - Track `reasoning_tokens` and `cached_input_tokens` in Google CUA usage (per step and aggregated). - Preserve 3.x click-family semantics (`double_click`, `triple_click`, `right_click`, `middle_click`, `move`) and drop calls with missing or non‑finite coordinates; add explicit `click_at` guard and a shared finite-number check; add unit tests for conversion/guards. - Guard required args and log custom‑tool collisions: reject `navigate` without `url` and `type`/`type_text_at` without `text` (empty allowed); log when a custom tool name conflicts with a predefined function (predefined wins). <sup>Written for commit eb1e3a7. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2273?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
… sessions and cloud API headers (#2277) ## What & why CLI-driven Browserbase usage isn't fully attributable today: - Remote **browser sessions** the CLI creates are tagged `userMetadata.browse_cli:"true"`, but carry no install/version, so we can't tie usage to an install or correlate with the CLI's anonymous PostHog telemetry. - `browse cloud search` (`/v1/search`) and `browse cloud fetch` create **no session at all**, so they're invisible in session metadata. This PR stamps a stable anonymous **`install_id`** + **`cli_version`** onto both paths. ## Changes (`packages/cli` only) - **New `src/lib/identity.ts`** — single source for install identity: `resolveInstallId` (async, memoized, **atomic** write via exclusive-create + EEXIST re-read), `peekInstallId` (sync, never blocks), `getCliVersion`, and `toMetadataValue` (sanitizes session-metadata values). install-id logic moved verbatim out of `telemetry.ts`; its tests pass unchanged. - **Sessions** — `driver/remote.ts` `remoteStagehandOptions()` adds sanitized `install_id` + `cli_version` to `userMetadata` (made async; resolver awaited with a safe fallback so telemetry never throws). Interface, local-only stub (now properly async so `.catch` works), and the call site updated. - **Cloud headers** — `lib/cloud/api.ts` sends `x-bb-client: browse-cli/<version>` (+ `x-bb-install-id` when resolved) on **both** transports: the raw `requestBrowserbaseJson` helper (covers `search`, sessions, contexts, projects, extensions) and `createBrowserbaseClient()` `defaultHeaders` (covers `fetch`, functions). Never emits empty-value headers. - Patch **changeset**. ## Why sanitize values Browserbase session-create runs `validateMetadataObject` — values must match `[\w\-_,;:.()&$%#@!?~]` and total ≤512 chars. A `+build` semver would otherwise **400 every remote session**, so `cli_version`/`install_id` are passed through `toMetadataValue()` before reaching `userMetadata`. (HTTP headers are unconstrained, so the full version stays in `x-bb-client`.) ## E2E Test Matrix | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `<local build> open https://example.com --remote` → `cloud sessions list --status RUNNING` | session `userMetadata` = `{stagehand:"true", browse_cli:"true", install_id:"<uuid>", cli_version:"0.9.0"}` | All 3 attribution keys land on a driver-created remote session; `install_id` equals the on-disk marker. (Server-side ingestion out of scope.) | | `cloud search "..."` and `cloud fetch https://example.com` against a local capture server (`--base-url`) | `/v1/search` and `/v1/fetch` each received `x-bb-client: browse-cli/0.9.0` + `x-bb-install-id: <uuid>` | Exact outgoing headers confirmed on **both** the raw-helper (search) and SDK (fetch) paths. | | `cloud search "..."` and `cloud fetch https://example.com` (live API) | real results JSON; fetch `200` + markdown | New headers don't break live calls. | | Migration smoke against built `dist/lib/identity.js`: seed legacy `~/…/cli/telemetry-id` = `1111…5555`, then `resolveInstallId` with `XDG_CONFIG_HOME` pointed at a temp dir | returned `1111…5555`; new `<tmp>/browserbase/install-id` contains `1111…5555` (fresh-dir case mints a new uuid instead) | Legacy id is carried forward to the new canonical path, not reset; first-run still mints. Also confirmed on real disk: existing `~/Library/.../telemetry-id` id copied to `~/.config/browserbase/install-id`, legacy file intact. | | Read-only-FS / Lambda resilience against built `dist`: (a) `HOME=/tmp/...` writable, (b) read-only `0555` dir, (c) `ENOTDIR` under `/dev/null`, (d) unwritable `HOME=/var/empty`, plus real `browse --version` and `browse cloud sessions list` under unwritable `HOME` | (a) persists to `/tmp/.../.config/browserbase/install-id`; (b)(c)(d) return a valid in-memory UUID, **nothing written, no throw**; real CLI exits cleanly (version prints; cloud cmd returns a clean `401`, not an FS crash); no illegal writes under `/var/empty` | install-id resolution + migration are best-effort: every read/mkdir/write is guarded and all 4 callers wrap in `.catch`. On Lambda (`HOME=/tmp`) it persists; if the dir/file is unwritable it degrades to a per-invocation in-memory id without failing the command. | | `turbo run build --filter=browse` · `pnpm lint` · `pnpm test:cli` | build + lint clean · **299/299** tests pass (+7 path-resolution / migration tests over the prior 292) | No regressions; path change + migration covered. | ## Review follow-ups (addressed) All 5 Cubic threads resolved: async local-only stub (so `.catch` works), atomic install-id write (race-safe on concurrent first runs — pre-existing behavior, hardened), and 12 focused unit tests for `toMetadataValue` (allowed-char filtering, `+build` stripping, truncation, UUID round-trip), the attribution headers, and the `remoteStagehandOptions` success + fallback paths. ## Dependency / follow-up (not in this PR) Session `userMetadata` keys are queryable in Snowflake today (`STG_SESSIONS.SESSION_METADATA`). The **search/fetch headers** only become useful once Platform logs `x-bb-client` / `x-bb-install-id` on those endpoints (`/v1/fetch` has an unpopulated `fetch_tasks.headers` column not yet in the Estuary mirror; `/v1/search` writes no DB row) — tracked as a server-side follow-up. ## Update — also tags `cloud sessions create` Extended attribution to the `browse cloud sessions create` path too (previously only the driver `open --remote` path carried it): its `userMetadata` now includes `browse_cli` + `install_id` + `cli_version` (sanitized via `toMetadataValue`), merged with any user-supplied `--body` metadata while keeping the attribution keys authoritative — a user can't spoof `browse_cli` to `"false"`. **So every CLI-created session is attributable — driver *and* `cloud sessions create`.** Verified live: `cloud sessions create` → readback `userMetadata: { browse_cli:"true", install_id:"…", cli_version:"0.9.0" }` → released. +2 tests (292 total cli tests). ## Update — standardized install-id path (review follow-up) Per review (thanks @pirate), moved the anonymous install-id marker off the bespoke per-OS path (`~/Library/Application Support/Browserbase/cli/telemetry-id`, `%APPDATA%/Browserbase/cli/telemetry-id`, `<xdg>/browserbase/cli/telemetry-id`) to the **standardized `~/.config/browserbase/install-id`** — consistent with core (`BROWSERBASE_CONFIG_DIR`) and the CLI's own `~/.config/browserbase/skills`. Honors `BROWSERBASE_CONFIG_DIR`, falls back to `XDG_CONFIG_HOME`/`~/.config` on every platform; the `BROWSERBASE_TELEMETRY_INSTALL_ID_FILE` override still short-circuits everything (incl. migration). **Backwards-compatible:** if the canonical file is absent but a legacy marker exists, its UUID is copied forward so existing installs keep their stable id — no attribution reset; the legacy file is left intact. Renamed `telemetry-id` → `install-id` since it's no longer telemetry-only (and `install-id`, not `device-id`, because it's a per-install id, not a hardware fingerprint). Considered and declined `node-machine-id`: a cross-app hardware id doesn't fix ephemeral-fleet counting (unpredictable across customer image strategies) and conflicts with the anonymous, install-scoped intent. Closes STG-2404. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why
users need a way to block known domains during browser sessions
# what changed
- users can now call `stagehand.context.setDomainPolicy({ blockedDomains
})` to block requests to specific domains across the whole context
- stagehand turns those domains into cdp `Fetch` request patterns, so
only matching blocklist requests are paused instead of intercepting all
traffic
- blocked requests are failed with `BlockedByClient`, which surfaces in
chrome as a client-side network block
- the policy is applied to already-open pages & automatically applied to
new pages, popups, & attached frame targets
- users can clear the policy with `setDomainPolicy(null)` or `{
blockedDomains: [] }`; clearing disables policy interception & removes
stagehand's request listener
- invalid domain inputs like full urls, paths, ports, queries, or
malformed wildcards throw an `StagehandInvalidArgumentError`
### fast follow:
- will follow up with a PR to add a domain allowlist, eg
`stagehand.context.setDomainPolicy({ allowedDomains })`
### behavioural notes:
- `setDomainPolicy({ blockedDomains: [...] })` applies to active context
sessions & future pages/targets
- exact domains like `ads.example.com` block only that hostname
- wildcard domains like `*.example.com` block subdomains, but not the
apex domain
- when no policy is set, stagehand does not enable `Fetch` interception
for this feature
- clearing with `setDomainPolicy(null)` or `{ blockedDomains: [] }`
disables the policy & removes stagehand's `Fetch.requestPaused` listener
# test plan
- `packages/core/tests/unit/domain-policy.test.ts` validates domain
normalization, exact/wildcard matching, invalid domain rejection &
generated `Fetch.RequestPattern` values
- `packages/core/tests/unit/context-domain-policy.test.ts` validates
`context.setDomainPolicy()` enables/disables `Fetch`, removes only its
own `Fetch.requestPaused` listener on clear, fails blocked requests &
continues unexpected non-blocked paused requests
- `packages/core/tests/integration/context-domain-policy.spec.ts`
validates blocked requests fail on an existing page & on a page created
after the policy is set
- `packages/core/tests/unit/public-api/public-error-types.test.ts`
validates `StagehandSetDomainPolicyError` is exported as a public error
type
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a context-wide domain blocklist that intercepts and blocks outgoing
HTTP(S) requests by domain using CDP `Fetch`. Includes a small API,
strict validation, and clearer error messages.
- **New Features**
- API: `context.setDomainPolicy(policy | null)` and
`context.getDomainPolicy()`.
- Patterns: exact hosts and leading wildcards only (`example.com`,
`*.example.com`); HTTP/HTTPS on any port; case-insensitive; trailing
dots handled.
- Scope: applies to existing and new pages/targets; clearing with `null`
or `[]` disables and removes our handler on success.
- Validation: domain-only strings; invalid inputs throw
`StagehandInvalidArgumentError`.
- Behavior: non-matches continue; matches fail with `BlockedByClient`.
- Errors: if `Fetch.enable` fails, we uninstall the handler for that
session, close new targets, and `newPage()` fails fast with
`StagehandSetDomainPolicyError` that includes per-session details and
CDP error text; if `Fetch.disable` fails, the handler stays installed
and the same error is thrown.
<sup>Written for commit 54c22df.
Summary will update on new commits.</sup>
<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2274?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>
<!-- End of auto-generated description by cubic. -->
Prepare the next browse release by versioning the package on `main`. What this PR does: - bumps `packages/cli/package.json` to `0.9.1` - updates the browse changelog - consumes the pending browse changesets After this PR merges, the `Release` workflow on `main` will publish `browse@0.9.1` from that exact commit using `pnpm pack` + `npm publish --provenance`. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Prepare `browse@0.9.1` by bumping the CLI version and updating the changelog; merging will trigger the Release workflow to publish. - **New Features** - Attribute CLI-driven Browserbase usage to an anonymous install. Sessions stamp `install_id` and `cli_version` in `userMetadata`. Cloud Search/Fetch send `x-bb-client` and `x-bb-install-id`. Best-effort and non-blocking. <sup>Written for commit f2eca53. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2285?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ons (#2282) ## Summary Adds `--verified` and `--proxies` to remote driver sessions so a Verified and/or proxied Browserbase session opens in **one command**: ```bash browse open <url> --remote --verified --proxies ``` Before this, the `browse` **driver** (`open`) had no way to request Verified/proxies — only `browse cloud sessions create --verified --proxies` (session **management**) could make such a session. So *driving* one took two steps: create it, then attach the driver via raw `--cdp`. That raw attach loses session identity for the whole lifetime (`browse status` reports `mode: cdp` with no Browserbase session ID; `doctor` can't reason about it) and, bypassing the normal remote path, never gets the `browse_cli` attribution tag — so Verified/proxied power users were invisible to browse-CLI telemetry. (Plain `browse open --remote` already worked in one step; it just couldn't ask for Verified/proxies.) Closes [STG-2265](https://linear.app/browserbase/issue/STG-2265/add-verified-proxies-to-browse-open-remote-and-attach-by-session-id) (Tier 1). ## What changed (CLI-only) - **New flags** `--verified` / `--proxies` on the shared driver flag set (so `open`, `doctor`, etc. accept them). Valid **only with `--remote`** — never implied, because implying it would silently switch the user to billed cloud sessions. Without `--remote` they hard-error with a hint. - **Threaded into session creation**: the remote `ConnectionTarget` carries the settings, and `remoteStagehandOptions` maps them onto `browserbaseSessionCreateParams` (`proxies: true`, `browserSettings.verified: true`) while keeping `userMetadata.browse_cli`. - **Sticky per session** like `--headed`/`--headless`: the settings join the mode-equality check, so a re-open requesting different settings fails with the usual stop-and-reopen error. - **`status` / `doctor` surface identity**: Browserbase session ID, dashboard URL, live-view (debug) URL, and verified/proxies state — read from the existing Stagehand getters, no extra API call. - Bundled browse skill updated to document the one-command form. - `--verified` requires a Browserbase Scale plan. ## Tier 2 (follow-up, intentionally not in this PR) `browse open <url> --remote --session-id <id>` (attach-by-ID for the create-then-attach long tail: regions, keep-alive, contexts, `--stdin` body). Kept separate to keep this PR atomic; the skill already points there as the bridge. ## E2E Test Matrix Run against live Browserbase with a local build (`node bin/run.js …`). Session IDs are ephemeral; the live-view URL is signed and redacted. | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `open <url> --verified` (no `--remote`) | `--verified require --remote. Try: browse open <url> --remote --verified` | Guard works; flag is never silently implied. | | `open <url> --proxies` (no `--remote`) | `--proxies require --remote. Try: browse open <url> --remote --proxies` | Same, for `--proxies`. Also verified it errors even with `BROWSERBASE_API_KEY` set (no auto-implied remote). | | `open "https://api.ipify.org?format=json" --remote --proxies` | `{ mode: "remote", browserbaseSessionId: "c64ca54b…", browserbaseSessionUrl: "https://www.browserbase.com/sessions/c64ca54b…", hasDebugUrl: true }` | Session identity (id + dashboard + live-view URL) is surfaced right on `open`, not lost like `--cdp`. | | `eval` egress IP: proxied vs non-proxied | proxied `8.28.99.210` vs non-proxied `44.248.86.34` → **different routes** | `--proxies` actually routes egress through Browserbase proxies (not just accepted). | | `status -s <proxied>` | `{ mode: "remote", browserbaseSessionId: "c64ca54b…", target: { kind: "remote", proxies: true } }` | `status` surfaces the session ID and proxies/verified state. | | `doctor --json -s <proxied>` | browserbase check: `session c64ca54b… — https://www.browserbase.com/sessions/c64ca54b…`; target: `reusing remote (proxies)` | `doctor` reasons about the live session + settings. | | Sticky: plain remote running, re-open `--remote --proxies` (and `--remote --verified`) | `Session "s" is already running in remote mode. Run browse stop --session s before changing modes.` | Settings are sticky; conflicting re-open fails like `--headed`. | | `open <url> --remote --verified` (live) | `{ mode: "remote", browserbaseSessionId: "e2f0ce9f…", target: { kind: "remote", verified: true } }` | Verified session is actually created (this key is on Scale). | | `pnpm test` | `Test Files 19 passed (19) · Tests 278 passed (278)` | Unit coverage incl. new resolution/guard/sticky tests + `remote-options` create-params threading. | | `pnpm lint` | prettier + eslint + `tsc --noEmit` clean | Format/lint/types green. | 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Add `--verified` and `--proxies` to remote driver sessions so you can open a Verified and/or proxied Browserbase session in one command: `browse open <url> --remote --verified --proxies`. This preserves session identity and makes `browse status` and `browse doctor` show the Browserbase session ID and links. Closes STG-2265. - **New Features** - `--verified` and `--proxies` are valid only with `--remote`; they are never implied. `--verified` requires a Browserbase Scale plan. - Settings are sticky for the session; changing them requires stopping and reopening the session. - `status` and `doctor` now show the Browserbase session ID, dashboard URL, live-view URL, and whether verified/proxies are enabled. `doctor` also suggests the correct `open` command with `--verified/--proxies` when relevant. - Flags are threaded into Browserbase session create params while keeping the `browse_cli` attribution tag. - **Bug Fixes** - The `--remote` guard message uses correct singular/plural grammar (“requires”/“require”). <sup>Written for commit d1806c9. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2282?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# why domain policy enforcement currently relies on `Fetch.requestPaused`, but popups opened with `window.open()` can reach their destination before Fetch interception is installed on the new target. in that race, the blocked popup may successfully open, & therefore break the domain policy # what changed this PR adds a fallback close path for popup targets whose URL violates the active domain policy: - this PR adds listening for `Target.targetCreated`, `Target.targetInfoChanged`, and attach-time target metadata for popup targets - if a popup reaches a blocked/disallowed domain before request interception catches it, the popup gets closed via Target.closeTarget` # test plan - added unit coverage for closing popup targets that already reached a blocked domain, including popups whose opener target is not locally tracked - added unit coverage for duplicate and late target events so a successfully closed popup is not closed/logged repeatedly - added unit coverage for the attach race where a `targetCreated` close is still in flight when `attached` handling runs; attach now continues if the close fails - added unit coverage for close failures, including treating `No target with given id found` as a successful already-closed outcome for this domain-policy fallback only - added integration coverage using the existing external popup fixture to verify a `window.open()` popup that reaches `news.ycombinator.com` is closed and not retained in `context.pages()` <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes a race where `window.open()` popups could reach blocked domains before interception by closing them immediately when their URL violates the active domain policy. Prevents blocked popups from appearing or lingering in `context.pages()`. - **Bug Fixes** - Listen to `Target.targetCreated`, `Target.targetInfoChanged`, and at attach-time; close popup via `Target.closeTarget` if its URL is disallowed. - Deduplicate close attempts across events and let attach wait for an in-flight close; continue attach if the close fails. - Treat “No target with given id found” as a successful already-closed outcome for this fallback; improve logging with rule reason and source. - Skip non-popup targets and persist successful close dedupe across late events. - **Dependencies** - Add changeset to publish a patch for `@browserbasehq/stagehand`. <sup>Written for commit a0dae3b. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2294?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. -->
…unning daemon (#2280) ## TL;DR `BROWSERBASE_API_KEY=xxx browse open <url> --remote` was **silently ignored** when the driver daemon was already running, so the CLI kept printing `Missing BROWSERBASE_API_KEY`. The daemon froze a copy of `process.env` at spawn time and never saw the late key. Fix: the client now **forwards the key with every command**, and the daemon **threads it straight into the Stagehand session at init** — no restart, no `browse stop`, warm sessions untouched. Closes STG-2407. Reported via the partner AX update (partner-2027dev, Jun 22); confirmed still live on `main`. --- ## Symptom ```console $ browse open https://example.com --remote # no key set → starts the daemon, fails Missing BROWSERBASE_API_KEY $ BROWSERBASE_API_KEY=bb_live_… browse open https://example.com --remote # key set, SAME daemon Missing BROWSERBASE_API_KEY # ❌ ignored — only `browse stop` + retry worked ``` This burns 3–4 retries per session and blocks the documented recover-after-interruption flow. ## Root cause The CLI is a thin client that talks to a long-lived background **daemon** (which holds the warm browser session). The key never reached that daemon: 1. **Frozen env.** The daemon is spawned `detached` with `env: process.env` captured **once** at spawn time (`daemon/client.ts`). A key exported/inlined in a *later* shell never propagates to it. 2. **Key read daemon-side.** The remote session is created inside the daemon, which read `process.env.BROWSERBASE_API_KEY` *there* (`remote.ts` → `session-manager.ts`). 3. **Protocol carried no credentials.** Requests had no way to deliver a key to an already-running daemon (`daemon/protocol.ts`). Since `--remote` is explicit, the client never blocked on the key — so it happily started a doomed key-less daemon. 4. **Stale backoff.** A cached init-failure backoff (5s→60s) replayed the *old* "missing key" error even on an immediate retry. ## The fix **Make the client the source of truth for the key; never trust the daemon's frozen env.** ``` client (fresh env each call) daemon (long-lived) ──────────────────────────── ─────────────────── collectForwardedEnv() ──{API_KEY}──▶ stash on session manager reads caller's env, over the owner-only socket │ ▼ at init only: remoteStagehandOptions(forwardedEnv) └▶ new Stagehand({ apiKey }) (key's only home = live session; never written back to process.env) ``` - **`daemon/forwarded-env.ts`** (new) — `collectForwardedEnv()` reads the caller's env; `forwardedEnvSignature()` is a secret-free `sha256` fingerprint used only to detect key changes. - **`protocol.ts` / `client.ts`** — every `open`/`command` request carries the caller's `forwardedEnv`. - **`server.ts` / `session-manager.ts`** — the daemon threads the forwarded key **into the Stagehand constructor at init**. It is **never written into the daemon's `process.env`**. When the fingerprint changes on a *cold* session, the stale init backoff is cleared so the retry runs immediately. - **Warm sessions are untouched** — an already-initialized session returns early and keeps its browser; the key only matters at init. > **Only the API key is forwarded.** The Browserbase backend infers the project from the key, so `BROWSERBASE_PROJECT_ID` isn't needed for session creation — verified end-to-end with no project id set anywhere. (A multi-project key pinning a non-default project via `BROWSERBASE_PROJECT_ID` is a rare edge; that still resolves from the daemon's own env, exactly as before.) ### Naming + drift guard (per review) The mechanism is named generically as **forwarded env vars**, not "credentials" (`ForwardedEnv`, `collectForwardedEnv` / `applyForwardedEnv` / `forwardedEnvKeys`, request field `forwardedEnv`, module `daemon/forwarded-env.ts`). It stays a **curated allowlist** (today just `BROWSERBASE_API_KEY`) rather than the whole `process.env`, because: (1) `Object.assign`-ing the full env into an already-running daemon silently no-ops for anything its modules read at import; (2) the caller's env also holds the daemon's own operational vars (`PATH`/`HOME`/`BROWSE_DAEMON_DIR`), so forwarding it wholesale risks clobbering the daemon and can't represent an unset; (3) the driver path has no AI ops, so no model keys are ever read — the API key is the only env-delivered session input. A new test, `tests/daemon-forwarded-env-drift.test.ts`, fails if any new daemon-path `process.env` read is left uncategorized (forward vs daemon-local), so the allowlist can't silently drift. <details> <summary>Why thread into the constructor instead of writing <code>process.env</code>?</summary> The key is read exactly once per session (at init); afterward the live `Stagehand` instance holds it. Writing it into the daemon's global env would leave a stray secret with no reader. Threading keeps the credential scoped to the session, and the change-detector is hashed so the raw key isn't kept in a second field. There's no perf cost — forwarding ~100 bytes is free next to cloud session creation; the only thing cached for speed is the warm session, which is independent of the key value. **Rejected alternatives:** writing the key into the daemon's env (stray secret, one-way env accumulation); auto-restarting the daemon (kills warm sessions, racy); a mere actionable error (the ask is for it to *work*, not just guide). </details> ## Security contract (local-only build) `BROWSERBASE_API_KEY` must not appear in the CDP-only artifact. The forwardable-key *list* is capability-gated: `forwardedEnvKeys()` returns the key in the full build (`remote.ts`) and `[]` in `remote.disabled.ts`. `collectForwardedEnv` / `forwardedEnvSignature` iterate the received object's own keys, so they stay key-name-free. Net: the literal lives only in `dist/lib/driver/remote.js`, and `tests/local-only-build.test.ts` still passes. ## Testing Local full build (`pnpm build`) of the code under review, real Browserbase key, exercising the exact repro against an already-running key-less daemon. | Step | Command / flow | Result | Proves | | --- | --- | --- | --- | | 1 | Key-less `open … --remote` (spawns daemon) | `Missing BROWSERBASE_API_KEY` (daemon stays up) | Reproduces the stranded key-less daemon | | 2 | Inline `BROWSERBASE_API_KEY=… open … --remote`, **same** daemon, **no project id anywhere** | ✅ `SUCCESS` — `"title": "Example Domain"`, `"url": "https://example.com/"` | The inline key reaches the running daemon (the fix); project inferred from the key | | 3 | Warm reuse: `open https://www.iana.org --remote` | ✅ `SUCCESS` — `"Internet Assigned Numbers Authority"`, same `targetId` | Warm-session fast path preserved | | 4 | `vitest` driver-foundation + remote-disabled + local-only + drift | ✅ 45 pass (4 files) | Unit coverage; asserts the key is **not** written to `process.env` | | 5 | `tsc -p tsconfig.local-only.json` + local-only-build test | ✅ typecheck clean | Security contract held (no key name in CDP-only build) | | 6 | `pnpm lint` | ✅ format + eslint + tsc clean | No regressions | | 7 | Drift guard: inject an uncategorized `process.env.X` on the daemon path | ✅ test goes red with "forward vs daemon-local" guidance, green after revert | The guard is non-vacuous — a new daemon-path env read can't slip through | ## Follow-ups (not in this PR) - **`browse open` ECONNREFUSED** (also AX-flagged): `sendDriverRequest` (`client.ts`) has no connect-retry, so a transient `ECONNREFUSED`/`ENOENT` (stale socket / daemon mid-shutdown) propagates raw. Couldn't reproduce under load — tracking separately rather than shipping unverified. - **Fail-fast `--remote` guard** (defense in depth): make explicit `--remote` resolve the key client-side like `autoSelectRemoteTarget` already does, so a key-less first call fails fast instead of spawning a doomed daemon. Forwarding alone fixes the reported bug; the guard would just improve the first-call error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary
Removes the `browse refs` command. It only re-printed the
`xpathMap`/`urlMap` cached from the last `browse snapshot` — which
`browse snapshot` already returns — so it was redundant. It was also a
footgun: it returned **stale** maps if the page had changed since that
snapshot.
`refs` was introduced in the CLI's oclif rewrite as one of the driver
commands and never pruned; nothing relies on it that `browse snapshot`
doesn't already cover.
## What's removed
- The `browse refs` command (`src/commands/refs.ts`)
- Its driver handler + the `"refs"` entry in `DriverCommandName`
- The now-unused `getRefMaps()` accessor on the session manager
- `browse refs` references in `README.md` and the browse `SKILL.md`
Ref-based commands (`click`, `fill`, `select`, …) are **unaffected** —
they resolve from the cached maps via `resolveSelector`, which is
untouched. `browse snapshot` continues to return the ref maps by
default.
## E2E Test Matrix
| Command / flow | Observed output | Confidence |
| --- | --- | --- |
| `browse refs` | `"browse refs" is not a browse command … Error:
command refs not found` | Command removed |
| `browse --help` | no `refs` entry | Removed from the surface |
| `browse snapshot` | unchanged (still returns `{ tree, urlMap, xpathMap
}`) | Snapshot behavior untouched |
| `driver-commands` unit tests | 14/14 pass | No test regressions |
| `pnpm --dir packages/cli build` (`tsc`) | success | Typechecks (incl.
narrowed `DriverCommandName`) |
Linear: [STG-2453](https://linear.app/browserbase/issue/STG-2453)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2298) thanks @yawbtng for the contribution here!! ## why CUA `keypress` actions describe a single key **chord** (modifiers held down while the main key is pressed), but `V3CuaAgentHandler.executeAction` pressed each key in the array **separately**. `page.keyPress(modifier)` presses and *releases* the modifier, so by the time the main key was pressed the modifier was already up. The concrete failure: a `["Control", "A"]` keypress sends `Control` on its own (a no-op) and then `A` through the plain typing path — so instead of select-all, the agent **types a literal `a` into the focused field**. Any select-all / copy / paste / cut / shortcut pattern silently fails *and* corrupts input. Because the agent-replay cache recorded the broken per-key sequence, replays reproduced the bug too. This is provider-dependent, based on the shape each client emits: | Provider | emits for a combo | old behavior | status | | --- | --- | --- | --- | | OpenAI | `keys: ["CTRL", "A"]` | `Ctrl` then literal `a` | ❌ broken | | Google (`key_combination`) | `.split("+")` → `["Control", "A"]` | `Ctrl` then literal `a` | ❌ broken | | Microsoft (`fara-7b`) | `keys: string[]` (per-key) | `Ctrl` then literal `a` | ❌ broken | | Anthropic | `keys: ["ctrl+s"]` (single `+`-joined string) | chorded correctly | ✅ unaffected | Anthropic only worked by accident — it pre-joins with `+`, which `page.keyPress` already chords internally. ## what changed `packages/core/lib/v3/handlers/v3CuaAgentHandler.ts` — in the `keypress` case, map each key and **join into one `+`-delimited combination**, then call `page.keyPress` once. `page.keyPress` already holds modifiers down for the final key and already special-cases the literal `+` key, so single keys, already-combined strings, and `Ctrl++`-style inputs all stay correct. `mapKeyToPlaywright` is idempotent (`CTRL`/`Control` → `Control`), so Google's pre-mapped arrays and Anthropic's combined string are unchanged. The recorded replay step is now a single `press Control+A` instead of the broken `press Control, press A`. ## test plan New `packages/core/tests/unit/cua-keypress-chord.test.ts` (5 cases, all passing): - `["Control", "A"]` → single `keyPress("Control+A")` - alias normalization: `["CTRL", "A"]` → `keyPress("Control+A")` - single key `["Enter"]` → `keyPress("Enter")` (unchanged) - already-combined `["ctrl+s"]` → `keyPress("ctrl+s")` (Anthropic shape, unchanged) - empty `[]` → no `keyPress` call Existing CUA suites (`anthropic-cua-triple-click`, `openai-cua-client`, `microsoft-cua-client`, `anthropic-cua-adaptive-thinking`) — 25 tests still green. --- Related: this is exactly the class of provider-specific CUA regression that #2188 proposes catching with a deterministic bench task. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes CUA keypress combos by pressing them as one chord. Shortcuts like Ctrl+A now work across OpenAI, Google `key_combination`, and Microsoft clients instead of typing letters. - **Bug Fixes** - Map keys, join with "+", and call `page.keyPress` once; supports arrays, already-joined strings, and the literal "+" key. - Normalize aliases (`CTRL` → `Control`) and record a single `press Control+A` step for replays. - Added unit tests for combos, alias normalization, single key, already-combined, and empty input. <sup>Written for commit c966475. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2298?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: yawbtng <154343001+yawbtng@users.noreply.github.com>
…rors (#2269) # why Edit: pulling in the description from @filip-michalsky on #2270 Root-cause fix for STG-2335: a non-CUA agent.execute() that successful run as { success: false }, with a red Invalidprompt: messages must be a ModelMessage[] error logged after all the work already completed. This replaces the symptom patch (wrap the finalization in try/catch and force state.completed = true) with a fix for the actual defect. Root cause After the main agent loop finishes, ensureDone() runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a fresh generateText call to produce the structured su-validates accumulated tool results, but this re-submissiondoes. When a custom tool returns an object with an optional field left undefined — e.g. PermitFlow's captureField returning { matchedExpected: undefined }when no expectedText is passed — that undefined lands insid The AI SDK's ModelMessage validation (standardizePrompt)rejects it, because its JSON-value schema disallows undefined (only null/string/number/boolean/object/array). The finalization throws, flipping the result to { success: false } even though every action succe ▎ Note: the original "reasoning traces" hypothesis was rule parts come back with a valid text: "" and pass validation.The undefined tool-result field is the trigger. # what changed sanitizeMessagesForResubmission() deep-strips undefined from the run history before the forced "done" call, keeping all real content. It only traverses plain objects/arrays, so class instances (URL, tyata, Date) pass through untouched. # test plan - 4 unit tests in agent-finalization-resilience.test.ts agaces InvalidPromptError with an undefined tool-result field →fixed by sanitize → real content (reasoning/tool-call/text) preserved → class instances untouched. All pass. - End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main (success=false, red error), succeeds on this branch(success=true, completed=true, no error). <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Prevents non-CUA `agent.execute()` from reporting a completed run as failed by sanitizing the run history before the forced "done" call and making finalization best-effort. Fixes STG-2335 for reasoning models like `openai/gpt-5.x` by stripping nested `undefined` values that break SDK prompt validation. - **Bug Fixes** - Deep-strip `undefined` from re-submitted messages via `sanitizeMessagesForResubmission`; traverse only plain objects/arrays to preserve real content and class instances; apply in `handleDoneToolCall` and null-guard `result.toolCalls`. - If the forced "done" call throws, log a warning and synthesize completion from the run instead of failing it. - Add unit tests for the InvalidPromptError repro (including `providerOptions` in `gpt-5.x`), sanitizer behavior, class-instance pass-through, and the finalization-failure fallback. <sup>Written for commit 4d1c904. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2269?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Filip Michalsky <31483888+filip-michalsky@users.noreply.github.com> Co-authored-by: Filip Michalsky <filip-michalsky@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## What & why Browserbase contexts are identified only by an opaque ID and the platform has **no server-side list endpoint**, so reusing a context today means remembering (or copy-pasting) a UUID. This is also the single most-used feature of the popular community Browserbase skill on ClawHub ([jamesfincher/browserbase](https://clawhub.ai/jamesfincher/skills/browserbase)), which keeps a local map of named contexts. This PR ports that ergonomic into the CLI as a thin, **client-side** name→ID map — no API change. - `browse cloud contexts create --name github` → creates the context and saves a local alias - `browse cloud contexts list` → shows your saved names (new command; the API has no list, so this reflects names saved on this device) - `contexts get|update|delete` and `sessions create --context-id` now accept a **saved name or a raw ID** (a resolver passes unknown refs through unchanged, so raw IDs still work everywhere) - `contexts delete` prunes the local alias for the deleted context The map lives at `(XDG_CONFIG_HOME||~/.config)/browserbase/contexts.json` (honoring `BROWSERBASE_CONFIG_DIR`), next to the CLI's existing state via the shared `resolveConfigDir()` helper. The file is written `0600`. It stores only the same IDs the API already returns, and a missing or corrupt file degrades to "no saved contexts" rather than erroring. Linear: [STG-2422](https://linear.app/browserbase/issue/STG-2422/named-contexts-for-the-cli-local-nameid-map) ## E2E Test Matrix Run against **live Browserbase** with the local build (`node bin/run.js`), `BROWSERBASE_CONFIG_DIR` pointed at a throwaway dir. Signed URLs redacted. | Command / flow | Observed output | Confidence / sufficiency | | --- | --- | --- | | `contexts create --name e2e-smoke` | `{"id":"45ed525f-…","uploadUrl":"<redacted>",…,"name":"e2e-smoke"}` | Proves a real context is created and the name is echoed back. | | `cat contexts.json` | `{"version":1,"contexts":{"e2e-smoke":{"id":"45ed525f-…","createdAt":"2026-06-27T00:40:51Z"}}}` | Proves the local name→ID map is persisted with the API-returned ID. | | `contexts get e2e-smoke` (by name) | `{"id":"45ed525f-…","projectId":"2d228d57-…",…}` | Proves name→ID resolution on `get` hits the real `/v1/contexts/<id>`. | | `contexts list --format table` | `Name ID Created`<br>`prod-login d98a30da 2026-06-27 00:41Z` | Proves the new list command renders the saved map. | | `sessions create --context-id prod-login --persist` (by name) | session `27f9087a-…` returned with `contextId == d98a30da-…` → **`MATCHES name→id: True`** | Key flow: a real session attaches to the right context purely from the **name**. | | `contexts delete prod-login` (by name) | `{"ok":true,"id":"d98a30da-…","removedAliases":["prod-login"]}` then `contexts list` → `No saved contexts.` | Proves API delete + local alias prune. | | `contexts delete <raw-uuid>` | `{"ok":true,"id":"45ed525f-…"}` (no `removedAliases`) | Proves raw IDs still work and orphan cleanup; no alias pruned when none matched. | | `pnpm test:cli` | `Test Files 21 passed (21) · Tests 310 passed (310)` | Full suite green, incl. new `contexts-store` unit tests + `contexts-named` CLI-level e2e (fake server) + updated surface test. | | `pnpm lint` | format + eslint + `tsc --noEmit` all clean | Types, style, lint. | ## Notes - Bump: `browse: patch` (matches how the CLI bumps; `browse` is in the changeset `ignore` list but still gets release-impacting patches by convention). - No new dependencies. Pure client-side; composes with the existing config-dir convention. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Add named contexts to the CLI so you can reuse a Browserbase context by name instead of copying IDs. Implements Linear STG-2422 with a local name→ID map, typo hints, raw‑ID compatibility, and a new `contexts add` command; no API changes. - **New Features** - `browse cloud contexts create --name <name>` saves a local alias and returns the name; `browse cloud contexts add <name> <id>` names an existing context (trims the ID, rejects empty; use `--force` to overwrite). - `browse cloud contexts list` shows saved names on this device (`--json` returns `{ "contexts": [...] }`). - `browse cloud contexts get|update|delete` and `browse cloud sessions create --context-id` accept a saved name or a raw ID. Unknown names close to a saved one fail early with a “did you mean?” hint; unknown refs otherwise pass through so non-UUID IDs still work. `contexts delete` prunes aliases for the deleted ID and includes `removedAliases` (best effort). - **Notes** - Aliases live at `(XDG_CONFIG_HOME||~/.config)/browserbase/contexts.json` (honors `BROWSERBASE_CONFIG_DIR`), written 0600 via atomic write; missing/corrupt files behave as empty. The map is prototype-safe, rejects UUID-shaped names, and sanitizes malformed entries on read; names must start alphanumeric, allow letters/digits/._-, max 64, and duplicates are blocked unless `--force`. <sup>Written for commit 00fdd03. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2284?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`. What this PR does: - bumps `packages/cli/package.json` to `0.9.2` - updates the browse changelog - consumes the pending browse changesets After this PR merges, the `Release` workflow on `main` will publish `browse@0.9.2` from that exact commit using `pnpm pack` + `npm publish --provenance`. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why - node 22.17.0 introduced a compatibility issue with node-fetch which broke the browserbase sdk when running on 22.17.0 - addresses #2291 # what changed - bumped the browserbase sdk version to ^2.14.0 which fixes the issue # test plan - tested locally by running node 22.17.0, observing the issue. the issue is resolved after bumping the browserbase sdk version <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Upgrade `@browserbasehq/sdk` to ^2.14.0 to restore compatibility with Node 22.17.0 by resolving the `node-fetch` issue. Fixes #2291. - **Bug Fixes** - Bumped `@browserbasehq/sdk` to ^2.14.0 in `packages/cli`, `packages/core`, and `packages/server-v3`; updated `pnpm-lock.yaml`. - Prevents runtime failures when running on Node 22.17.0. <sup>Written for commit 78ffe34. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2307?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. -->
…ge (#2310) ## What Changeset-only patch bump (`browse` 0.9.2 → 0.9.3) to trigger the **first browse release that publishes the Docker image** added in #2295. `browse@0.9.2` shipped *before* the GHCR publish step existed, so no image is on the registry yet. The Docker step only runs when browse's version changes (`should_publish`), so a release is required to produce it. This PR contains **only** a changeset — no code changes. ## Release path (after this merges) 1. Merge this PR → the `browse` changeset lands on `main`. 2. Run the **Prepare CLI Release** workflow → it opens a `Release browse@0.9.3` PR. 3. Merge that PR → the **Release** workflow publishes `browse@0.9.3` to npm **and** builds/pushes `ghcr.io/browserbase/browse` (multi-arch, pinned). ## One-time follow-up GHCR packages default to private — after the first push, set `ghcr.io/browserbase/browse` visibility to **Public** so sandboxes can pull it anonymously. --- Linear: [STG-2468](https://linear.app/browserbase/issue/STG-2468/release-browse-093-to-publish-the-docker-image-ghcriobrowserbasebrowse) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Patch bump for `browse` 0.9.3 to trigger the first Docker image publish to `ghcr.io/browserbase/browse` (multi-arch, pinned per release). No code changes; adds a changeset so the release builds and pushes the image. Addresses Linear STG-2468. - **Migration** - After the first publish, set `ghcr.io/browserbase/browse` visibility to Public so sandboxes can pull without auth. <sup>Written for commit 09b8f9d. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2310?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare the next browse release by versioning the package on `main`. What this PR does: - bumps `packages/cli/package.json` to `0.9.3` - updates the browse changelog - consumes the pending browse changesets After this PR merges, the `Release` workflow on `main` will publish `browse@0.9.3` from that exact commit using `pnpm pack` + `npm publish --provenance`. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Prepare the `browse@0.9.3` release by bumping the package version, updating the changelog, and consuming the pending changeset. On merge, the Release workflow on `main` will publish via `pnpm pack` + `npm publish --provenance`; this release also notes the new Docker image at `ghcr.io/browserbase/browse`. <sup>Written for commit 32f3a51. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2311?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
why
what changed
test plan