Add polygraph skill: behavioral trust grades for MCP servers#477
Add polygraph skill: behavioral trust grades for MCP servers#477RubenSousaDinis wants to merge 8 commits into
Conversation
Polygraph grades MCP servers A–F by connecting like an agent, fingerprinting the exact tool surface, and running three behavioral probes (C-01 tool-output injection, C-02 permission/egress overreach, C-03 sensitive-data leak), then publishing a reproducible grade as an onchain EAS attestation on Base. The skill covers: checking a grade (`npx polygraphso check <server>`), running the open litmus harness locally to grade your own server, why a server got a given grade, and the verify-before-trust pattern for Bankr agents (recompute the live tool-surface fingerprint and require it to match the attestation before executing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Behavioral grades are now live via `polygraphso check` / `list` (A–F across graded servers), so replace the "rolling out / not yet available" framing and the stale example outputs with the real current CLI output, including the shipped grades. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"Triggers on:" inside the plain-scalar description made YAML read it as a nested
mapping ("mapping values are not allowed in this context"). Reword to "Triggers
on mentions of" (no colon), matching the zerion skill convention.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review feedback: - Scope to MCP servers (drop "AI tools" — the whole harness is MCP-specific). - Make the remote/Docker-less B-cap explicit and frame it as a property of the measurement, not a knock (a remote B is not "worse than" a local A). - Stop hardcoding named third-party grades; keep one live first-party A as proof and treat the live set / attestation as the point-in-time source of truth. - Present the live scale as A/B/D/F; note C/E are not assigned (C reserved). - Elevate runtime verify-before-trust above the get-graded CTA and surface the evasion caveat at the trust decision. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified the skill against the now-published package. Remove the `challenge` command and the `check <ref[@Version]>` form — neither exists in the published CLI (commands: litmus/check/list; flags --json/--bearer/--header/--allow-state-changing and env POLYGRAPH_API_URL/LITMUS_BEARER/LITMUS_STDIO_ISOLATION all confirmed present). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add references/ci-gate.md (the polygraphso/litmus@v1 Action that fails a build when an MCP server or an Agent Skill grades D/F) and a 'Gate your CI on grades' section + reference link in SKILL.md. Co-Authored-By: Claude <noreply@anthropic.com>
…ory, the ci command Bring the skill up to date with the published @polygraphso/litmus: four probe categories (adds C-04 adversarial-input handling), methodology version litmus-v9 in the illustrative outputs, and the ci command in the CLI reference. Co-Authored-By: Claude <noreply@anthropic.com>
Adds the required catalog.json (slug=polygraph, install type bankr) so the skill appears in the Bankr Discover catalog, a square logo.svg, and a README table row. Rebased onto current main.
6738fcb to
f353528
Compare
saltoriousSIG
left a comment
There was a problem hiding this comment.
A few things to address before merging!
-
The skill says “one command before your agent installs anything,” but that command is npx polygraphso ..., which installs/executes a third-party npm package. Same for @polygraphso/litmus. Please pin package versions/integrity and avoid presenting npx as a no-install trust check.
-
The GitHub Action is referenced as polygraphso/litmus@v1. For a security gate, this should be pinned to a commit SHA, not a mutable tag. Same concern for npx @polygraphso/litmus ci.
-
The CI gate auto-discovers MCP configs and may run/grade servers from PR-controlled files. That is risky on CI, especially with Docker/socket/network/secrets available. Please add strong warnings: don’t run this with secrets on untrusted PRs, don’t use pull_request_target, pin dependencies, and prefer explicit allowlisted targets over auto-discovery for public repos.
-
bearer / --bearer support can pass credentials into remote MCP checks. Please document that bearer tokens must not be provided on untrusted PRs or auto-discovered targets, and should be scoped/ephemeral.
-
The runtime gate defaults to accepting A/B, but remote MCP servers cap at B because egress is unverified. That means the default accepts servers where network exfiltration was not tested. Please make this explicit in the Bankr execution path and require a higher/manual review bar before routing signed actions/payments through remote B servers.
-
Attestation verification needs stricter trust rules. The docs note self-published grades are forgeable, but the examples still frame readAttestation + gateDecision as enough before “pay/execute.” Please require schema ID, chain ID, revocation status, methodology version, attester allowlist or reproducible rerun, and exact fingerprint match before any Bankr action.
-
POLYGRAPH_API_URL can override the lookup endpoint. That’s useful for dev but dangerous in agent/runtime docs. Please warn not to use untrusted lookup endpoints for execution decisions, or require host allowlisting.
-
The skill grading described in CI is static-only. Static scanning skill text can miss bundled scripts, install commands, remote URLs, and runtime behavior. Please avoid implying this is equivalent to behavioral MCP grading, and require manual/security review for skills with install-time code execution or transaction instructions.
polygraph — behavioral trust grades (A–F) for MCP servers and Agent Skills
Adds a
polygraph/skill (polygraph.so). Polygraph connects to an MCPserver the way an agent would, fingerprints its exact tool surface, and runs four behavioral
probes — C-01 tool-output injection, C-02 permission/egress overreach, C-03
sensitive-data leak, C-04 adversarial-input handling — then grades it A–F and can publish a
reproducible grade as an onchain EAS attestation on Base. The harness is open source, so anyone
can re-run it and disprove a bad grade.
CTA for builders
What the skill covers
npx polygraphso check npm/@modelcontextprotocol/server-filesystemthe attestation before letting Bankr execute (the runtime gate)
polygraphso/litmus@v1(ornpx @polygraphso/litmus ci)fails a build when an MCP server or a skill it ships grades D/F — see
polygraph/references/ci-gate.mdConforms to the contribution guide
polygraph/catalog.json—slugequals the folder,install.type: bankr, so it appears in theBankr Discover catalog
polygraph/logo.svg(square mark),polygraph/SKILL.mdwithname+descriptionfrontmatter,supporting docs under
polygraph/references/mainVerified against the published packages
polygraphso— the lookup CLI (check/list)@polygraphso/litmus— the open harness, thecigate, and thepolygraphso/litmus@v1GitHubAction
Happy to adjust naming/scope to match your conventions.