Skip to content

perplexityai/bumblebee

bumblebee

Bumblebee is a read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints.

It answers a narrow supply-chain response question: when an advisory names a package, extension, or version, which developer machines show a match in their on-disk metadata right now?

SBOMs help answer what shipped, and EDR helps answer what ran or touched the network, but supply-chain response often needs a different view: messy local state across lockfiles, package-manager metadata, extension manifests, and supported developer-tool configs.

Bumblebee turns that scattered on-disk state into structured NDJSON component records and, when given an exposure catalog, flags exact matches for fast, read-only exposure checks when responders already know what they are looking for.

Scope

  • Single static binary, Go 1.25+, zero non-stdlib dependencies.
  • Three scan profiles (baseline, project, deep) for different populations and cadences.
  • Reads only the lockfiles, package-manager install metadata, extension manifests, and supported MCP JSON configs listed in docs/inventory-sources.md. No package manager execution (npm ls, pip show, go list, ...) and no source-file reads. MCP host configs can carry environment values and credentials in their env blocks; Bumblebee parses these configs for the server inventory it needs but does not emit those values in its records.

Coverage

Family Emitted ecosystem Sources
npm npm package-lock.json, npm-shrinkwrap.json, node_modules/.package-lock.json, node_modules/<pkg>/package.json
pnpm npm pnpm-lock.yaml, .pnpm/.../package.json
Yarn npm yarn.lock (Classic + Berry)
Bun npm bun.lock; bun.lockb presence as diagnostic
PyPI pypi *.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO
Go modules go go.sum, go.mod
RubyGems rubygems Gemfile.lock, installed *.gemspec
Composer packagist composer.lock, vendor/composer/installed.json
MCP mcp JSON host configs: mcp.json, .mcp.json, claude_desktop_config.json, mcp_config.json, mcp_settings.json, cline_mcp_settings.json, plus ~/.gemini/settings.json (Gemini CLI / Code Assist). Non-JSON configs (Codex config.toml, Continue YAML) are not parsed in v0.1.
Editor extensions editor-extension VS Code, Cursor, Windsurf, VSCodium manifests
Browser extensions browser-extension Chromium-family (manifest.json) and Firefox (extensions.json) per profile

Per-ecosystem detail: docs/inventory-sources.md.

Install

Requires Go 1.25+. Zero non-stdlib dependencies.

# Install the latest tagged release into $GOBIN.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest

# Or pin a specific tag.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1

To build from a checkout:

go build -o bumblebee ./cmd/bumblebee
go test ./...

Stamp an explicit version at build time:

go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebee

bumblebee version prints the version plus the VCS revision, build time, and Go runtime — so a record emitted in production can be traced back to a specific build. Version precedence: -ldflags override, module version recorded by go install, then the in-tree default tracked in VERSION.

Self-test

After installing, run a built-in end-to-end check against embedded fixtures:

bumblebee selftest
# selftest OK (2 findings in 1ms)

The fixtures live inside the binary, use deliberately fake package names (bumblebee-selftest-evil@0.0.0), and make no network calls. A non-zero exit means the local install can no longer detect what it should — a fast pre-deployment smoke test for fleet rollouts.

Profiles

Bumblebee is a one-shot scanner: each invocation performs a single scan and exits. Cadence is the runner's responsibility (cron, launchd, systemd, MDM, etc.). Each record carries profile and a per-root root_kind so receivers can keep populations separate.

Profile Scans Use for
baseline Common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. Recurring lightweight inventory via an external runner.
project Configured development directories, such as ~/code, ~/src, or ~/work. Recurring inventory for known project workspaces.
deep Explicit --root paths, including broad roots like $HOME. On-demand incident or campaign checks, usually with --ecosystem, --exposure-catalog, and --findings-only.

baseline and project refuse bare-home roots; only deep walks them.

Quick start

# Baseline global inventory.
bumblebee scan --profile baseline > inventory.ndjson

# Daily project sweep with explicit roots.
bumblebee scan --profile project \
  --root "$HOME/code" \
  --root "$HOME/Developer"

# Limit a run to selected emitted ecosystems.
bumblebee scan --profile baseline \
  --ecosystem npm,pypi \
  --ecosystem go

# On-demand exposure scan against a published advisory.
bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog ./catalog.json \
  --max-duration 10m

Preview the resolved roots without scanning:

bumblebee roots --profile baseline
# prints "<root_kind>\t<path>" lines

--root is a filesystem path to scan; repeatable, required for deep, optional for the other profiles. --ecosystem is repeatable and comma-separated. --exposure-catalog accepts a JSON file or a directory of *.json catalogs (merged non-recursively, all files must share schema_version). --findings-only requires --exposure-catalog and suppresses package records while keeping findings. bumblebee scan --help lists every flag.

Output

Records are NDJSON, one per line. Diagnostics go to stderr as NDJSON. Each run ends with a scan_summary record; receivers use it to decide whether to promote a run to current state. See docs/transport.md for HTTPS/file output and docs/state-model.md for the receiver-side current-state model.

Package record:

Example package record
{
  "record_type": "package",
  "record_id": "package:...",
  "schema_version": "0.1.0",
  "scanner_name": "bumblebee",
  "scanner_version": "v0.1.1",
  "run_id": "9b1f0c2e4d5a6b7c8d9e0f1a2b3c4d5e",
  "scan_time": "2026-05-15T18:22:01.482Z",
  "endpoint": {
    "hostname": "alex-mbp",
    "os": "darwin",
    "arch": "arm64",
    "username": "alex",
    "uid": "501",
    "device_id": "MDM-7F4A2B"
  },
  "profile": "project",
  "ecosystem": "npm",
  "package_name": "@tanstack/query-core",
  "normalized_name": "@tanstack/query-core",
  "version": "5.59.20",
  "project_path": "/Users/alex/code/web-app",
  "root_kind": "project_root",
  "package_manager": "pnpm",
  "source_type": "pnpm-lockfile",
  "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
  "has_lifecycle_scripts": false,
  "confidence": "high"
}

confidence:

  • high — exact identity and version came from canonical metadata.
  • medium — identity is reliable, but version or source is partial.
  • low — config/path/spec reference only; not proof of an installed exact version.

Finding record (exposure-catalog match):

Example finding record
{
  "record_type": "finding",
  "record_id": "finding:...",
  "schema_version": "0.1.0",
  "scanner_name": "bumblebee",
  "scanner_version": "v0.1.1",
  "run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e",
  "scan_time": "2026-05-15T18:22:01.482Z",
  "endpoint": {
    "hostname": "alex-mbp",
    "os": "darwin",
    "arch": "arm64",
    "username": "alex",
    "uid": "501",
    "device_id": "MDM-7F4A2B"
  },
  "profile": "deep",
  "finding_type": "package_exposure",
  "severity": "critical",
  "catalog_id": "advisory-2026-0042",
  "catalog_name": "example-pkg 1.2.3 (compromised release)",
  "ecosystem": "npm",
  "package_name": "example-pkg",
  "normalized_name": "example-pkg",
  "version": "1.2.3",
  "root_kind": "deep_home_root",
  "project_path": "/Users/alex/code/web-app",
  "source_type": "pnpm-lockfile",
  "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
  "confidence": "high",
  "evidence": "exact name+version match (version=1.2.3)"
}

record_id is a content-addressed hash of a canonical identity tuple per record type, stable across runs. Per-record-type field lists and dedupe guidance: docs/state-model.md.

Exposure Catalog Format

Minimal JSON, exact (ecosystem, name, version) matching only:

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "advisory-2026-0042",
      "name": "example-pkg 1.2.3 (compromised release)",
      "ecosystem": "npm",
      "package": "example-pkg",
      "versions": ["1.2.3"],
      "severity": "critical"
    }
  ]
}

The catalog must be a JSON object with schema_version and entries keys. Bare top-level arrays are rejected. Unsupported future schema_version values are rejected. Multiple catalog files can be loaded together by pointing --exposure-catalog at a directory; see the flag description above.

Sample exposure catalogs

The threat_intel/ directory holds maintained exposure catalogs built from public threat-intelligence reporting on recent supply-chain campaigns, assembled with Perplexity Computer and updated via PRs as new campaigns are reported. See threat_intel/README.md for the current catalog list and review guidance.

License

Apache License 2.0. See LICENSE.

About

Read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints, built for fast supply-chain exposure checks.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages