Bumblebee is a read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints.
It answers a narrow supply-chain response question: when an advisory names a package, extension, or version, which developer machines show a match in their on-disk metadata right now?
SBOMs help answer what shipped, and EDR helps answer what ran or touched the network, but supply-chain response often needs a different view: messy local state across lockfiles, package-manager metadata, extension manifests, and supported developer-tool configs.
Bumblebee turns that scattered on-disk state into structured NDJSON component records and, when given an exposure catalog, flags exact matches for fast, read-only exposure checks when responders already know what they are looking for.
- Single static binary, Go 1.25+, zero non-stdlib dependencies.
- Three scan profiles (
baseline,project,deep) for different populations and cadences. - Reads only the lockfiles, package-manager install metadata,
extension manifests, and supported MCP JSON configs listed in
docs/inventory-sources.md. No package
manager execution (
npm ls,pip show,go list, ...) and no source-file reads. MCP host configs can carry environment values and credentials in theirenvblocks; Bumblebee parses these configs for the server inventory it needs but does not emit those values in its records.
| Family | Emitted ecosystem |
Sources |
|---|---|---|
| npm | npm |
package-lock.json, npm-shrinkwrap.json, node_modules/.package-lock.json, node_modules/<pkg>/package.json |
| pnpm | npm |
pnpm-lock.yaml, .pnpm/.../package.json |
| Yarn | npm |
yarn.lock (Classic + Berry) |
| Bun | npm |
bun.lock; bun.lockb presence as diagnostic |
| PyPI | pypi |
*.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO |
| Go modules | go |
go.sum, go.mod |
| RubyGems | rubygems |
Gemfile.lock, installed *.gemspec |
| Composer | packagist |
composer.lock, vendor/composer/installed.json |
| MCP | mcp |
JSON host configs: mcp.json, .mcp.json, claude_desktop_config.json, mcp_config.json, mcp_settings.json, cline_mcp_settings.json, plus ~/.gemini/settings.json (Gemini CLI / Code Assist). Non-JSON configs (Codex config.toml, Continue YAML) are not parsed in v0.1. |
| Editor extensions | editor-extension |
VS Code, Cursor, Windsurf, VSCodium manifests |
| Browser extensions | browser-extension |
Chromium-family (manifest.json) and Firefox (extensions.json) per profile |
Per-ecosystem detail: docs/inventory-sources.md.
Requires Go 1.25+. Zero non-stdlib dependencies.
# Install the latest tagged release into $GOBIN.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
# Or pin a specific tag.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1To build from a checkout:
go build -o bumblebee ./cmd/bumblebee
go test ./...Stamp an explicit version at build time:
go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebeebumblebee version prints the version plus the VCS revision, build
time, and Go runtime — so a record emitted in production can be traced
back to a specific build. Version precedence: -ldflags override,
module version recorded by go install, then the in-tree default
tracked in VERSION.
After installing, run a built-in end-to-end check against embedded fixtures:
bumblebee selftest
# selftest OK (2 findings in 1ms)The fixtures live inside the binary, use deliberately fake package
names (bumblebee-selftest-evil@0.0.0), and make no network calls. A
non-zero exit means the local install can no longer detect what it
should — a fast pre-deployment smoke test for fleet rollouts.
Bumblebee is a one-shot scanner: each invocation performs a single scan
and exits. Cadence is the runner's responsibility (cron, launchd, systemd,
MDM, etc.). Each record carries profile and a per-root root_kind so
receivers can keep populations separate.
| Profile | Scans | Use for |
|---|---|---|
baseline |
Common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. | Recurring lightweight inventory via an external runner. |
project |
Configured development directories, such as ~/code, ~/src, or ~/work. |
Recurring inventory for known project workspaces. |
deep |
Explicit --root paths, including broad roots like $HOME. |
On-demand incident or campaign checks, usually with --ecosystem, --exposure-catalog, and --findings-only. |
baseline and project refuse bare-home roots; only deep walks them.
# Baseline global inventory.
bumblebee scan --profile baseline > inventory.ndjson
# Daily project sweep with explicit roots.
bumblebee scan --profile project \
--root "$HOME/code" \
--root "$HOME/Developer"
# Limit a run to selected emitted ecosystems.
bumblebee scan --profile baseline \
--ecosystem npm,pypi \
--ecosystem go
# On-demand exposure scan against a published advisory.
bumblebee scan --profile deep \
--root "$HOME" \
--exposure-catalog ./catalog.json \
--max-duration 10mPreview the resolved roots without scanning:
bumblebee roots --profile baseline
# prints "<root_kind>\t<path>" lines--root is a filesystem path to scan; repeatable, required for deep,
optional for the other profiles. --ecosystem is repeatable and
comma-separated. --exposure-catalog accepts a JSON file or a directory
of *.json catalogs (merged non-recursively, all files must share
schema_version). --findings-only requires --exposure-catalog and
suppresses package records while keeping findings. bumblebee scan --help
lists every flag.
Records are NDJSON, one per line. Diagnostics go to stderr as NDJSON. Each
run ends with a scan_summary record; receivers use it to decide whether
to promote a run to current state. See docs/transport.md
for HTTPS/file output and docs/state-model.md for the
receiver-side current-state model.
Package record:
Example package record
{
"record_type": "package",
"record_id": "package:...",
"schema_version": "0.1.0",
"scanner_name": "bumblebee",
"scanner_version": "v0.1.1",
"run_id": "9b1f0c2e4d5a6b7c8d9e0f1a2b3c4d5e",
"scan_time": "2026-05-15T18:22:01.482Z",
"endpoint": {
"hostname": "alex-mbp",
"os": "darwin",
"arch": "arm64",
"username": "alex",
"uid": "501",
"device_id": "MDM-7F4A2B"
},
"profile": "project",
"ecosystem": "npm",
"package_name": "@tanstack/query-core",
"normalized_name": "@tanstack/query-core",
"version": "5.59.20",
"project_path": "/Users/alex/code/web-app",
"root_kind": "project_root",
"package_manager": "pnpm",
"source_type": "pnpm-lockfile",
"source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
"has_lifecycle_scripts": false,
"confidence": "high"
}confidence:
high— exact identity and version came from canonical metadata.medium— identity is reliable, but version or source is partial.low— config/path/spec reference only; not proof of an installed exact version.
Finding record (exposure-catalog match):
Example finding record
{
"record_type": "finding",
"record_id": "finding:...",
"schema_version": "0.1.0",
"scanner_name": "bumblebee",
"scanner_version": "v0.1.1",
"run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e",
"scan_time": "2026-05-15T18:22:01.482Z",
"endpoint": {
"hostname": "alex-mbp",
"os": "darwin",
"arch": "arm64",
"username": "alex",
"uid": "501",
"device_id": "MDM-7F4A2B"
},
"profile": "deep",
"finding_type": "package_exposure",
"severity": "critical",
"catalog_id": "advisory-2026-0042",
"catalog_name": "example-pkg 1.2.3 (compromised release)",
"ecosystem": "npm",
"package_name": "example-pkg",
"normalized_name": "example-pkg",
"version": "1.2.3",
"root_kind": "deep_home_root",
"project_path": "/Users/alex/code/web-app",
"source_type": "pnpm-lockfile",
"source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
"confidence": "high",
"evidence": "exact name+version match (version=1.2.3)"
}record_id is a content-addressed hash of a canonical identity tuple per
record type, stable across runs. Per-record-type field lists and dedupe
guidance: docs/state-model.md.
Minimal JSON, exact (ecosystem, name, version) matching only:
{
"schema_version": "0.1.0",
"entries": [
{
"id": "advisory-2026-0042",
"name": "example-pkg 1.2.3 (compromised release)",
"ecosystem": "npm",
"package": "example-pkg",
"versions": ["1.2.3"],
"severity": "critical"
}
]
}The catalog must be a JSON object with schema_version and entries
keys. Bare top-level arrays are rejected. Unsupported future
schema_version values are rejected. Multiple catalog files can be
loaded together by pointing --exposure-catalog at a directory; see
the flag description above.
The threat_intel/ directory holds maintained exposure
catalogs built from public threat-intelligence reporting on recent
supply-chain campaigns, assembled with
Perplexity Computer and updated
via PRs as new campaigns are reported. See
threat_intel/README.md for the current
catalog list and review guidance.
Apache License 2.0. See LICENSE.