Skip to content

Add polygraph skill: behavioral trust grades for MCP servers#477

Open
RubenSousaDinis wants to merge 8 commits into
BankrBot:mainfrom
RubenSousaDinis:add-polygraph-skill
Open

Add polygraph skill: behavioral trust grades for MCP servers#477
RubenSousaDinis wants to merge 8 commits into
BankrBot:mainfrom
RubenSousaDinis:add-polygraph-skill

Conversation

@RubenSousaDinis

@RubenSousaDinis RubenSousaDinis commented Jun 16, 2026

Copy link
Copy Markdown

polygraph — behavioral trust grades (A–F) for MCP servers and Agent Skills

Adds a polygraph/ skill (polygraph.so). Polygraph connects to an MCP
server the way an agent would, fingerprints its exact tool surface, and runs four behavioral
probes — C-01 tool-output injection, C-02 permission/egress overreach, C-03
sensitive-data leak, C-04 adversarial-input handling — then grades it A–F and can publish a
reproducible grade as an onchain EAS attestation on Base. The harness is open source, so anyone
can re-run it and disprove a bad grade.

CTA for builders

Check any MCP server before your agent uses it: npx polygraphso check <server>.
Get your own server graded at polygraph.so.

What the skill covers

  • Check a gradenpx polygraphso check npm/@modelcontextprotocol/server-filesystem
  • Verify before you trust — recompute the live tool-surface fingerprint and require it to match
    the attestation before letting Bankr execute (the runtime gate)
  • Gate CI on grades — the GitHub Action polygraphso/litmus@v1 (or npx @polygraphso/litmus ci)
    fails a build when an MCP server or a skill it ships grades D/F — see
    polygraph/references/ci-gate.md
  • Get your project graded — run the open litmus harness on your own server or skill
  • Why a server got grade X — the A–F decision logic + how to read the report

Conforms to the contribution guide

  • polygraph/catalog.jsonslug equals the folder, install.type: bankr, so it appears in the
    Bankr Discover catalog
  • polygraph/logo.svg (square mark), polygraph/SKILL.md with name + description frontmatter,
    supporting docs under polygraph/references/
  • README row added; rebased onto current main

Verified against the published packages

  • polygraphso — the lookup CLI (check / list)
  • @polygraphso/litmus — the open harness, the ci gate, and the polygraphso/litmus@v1 GitHub
    Action

Happy to adjust naming/scope to match your conventions.

@RubenSousaDinis RubenSousaDinis marked this pull request as ready for review June 16, 2026 13:49
RubenSousaDinis and others added 8 commits June 26, 2026 12:14
Polygraph grades MCP servers A–F by connecting like an agent, fingerprinting
the exact tool surface, and running three behavioral probes (C-01 tool-output
injection, C-02 permission/egress overreach, C-03 sensitive-data leak), then
publishing a reproducible grade as an onchain EAS attestation on Base.

The skill covers: checking a grade (`npx polygraphso check <server>`), running
the open litmus harness locally to grade your own server, why a server got a
given grade, and the verify-before-trust pattern for Bankr agents (recompute the
live tool-surface fingerprint and require it to match the attestation before
executing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Behavioral grades are now live via `polygraphso check` / `list` (A–F across
graded servers), so replace the "rolling out / not yet available" framing and
the stale example outputs with the real current CLI output, including the
shipped grades.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"Triggers on:" inside the plain-scalar description made YAML read it as a nested
mapping ("mapping values are not allowed in this context"). Reword to "Triggers
on mentions of" (no colon), matching the zerion skill convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review feedback:
- Scope to MCP servers (drop "AI tools" — the whole harness is MCP-specific).
- Make the remote/Docker-less B-cap explicit and frame it as a property of the
  measurement, not a knock (a remote B is not "worse than" a local A).
- Stop hardcoding named third-party grades; keep one live first-party A as proof
  and treat the live set / attestation as the point-in-time source of truth.
- Present the live scale as A/B/D/F; note C/E are not assigned (C reserved).
- Elevate runtime verify-before-trust above the get-graded CTA and surface the
  evasion caveat at the trust decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified the skill against the now-published package. Remove the `challenge`
command and the `check <ref[@Version]>` form — neither exists in the published
CLI (commands: litmus/check/list; flags --json/--bearer/--header/--allow-state-changing
and env POLYGRAPH_API_URL/LITMUS_BEARER/LITMUS_STDIO_ISOLATION all confirmed present).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add references/ci-gate.md (the polygraphso/litmus@v1 Action that fails a build when
an MCP server or an Agent Skill grades D/F) and a 'Gate your CI on grades' section +
reference link in SKILL.md.

Co-Authored-By: Claude <noreply@anthropic.com>
…ory, the ci command

Bring the skill up to date with the published @polygraphso/litmus: four probe
categories (adds C-04 adversarial-input handling), methodology version litmus-v9
in the illustrative outputs, and the ci command in the CLI reference.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds the required catalog.json (slug=polygraph, install type bankr) so the skill
appears in the Bankr Discover catalog, a square logo.svg, and a README table row.
Rebased onto current main.

@saltoriousSIG saltoriousSIG left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to address before merging!

  • The skill says “one command before your agent installs anything,” but that command is npx polygraphso ..., which installs/executes a third-party npm package. Same for @polygraphso/litmus. Please pin package versions/integrity and avoid presenting npx as a no-install trust check.

  • The GitHub Action is referenced as polygraphso/litmus@v1. For a security gate, this should be pinned to a commit SHA, not a mutable tag. Same concern for npx @polygraphso/litmus ci.

  • The CI gate auto-discovers MCP configs and may run/grade servers from PR-controlled files. That is risky on CI, especially with Docker/socket/network/secrets available. Please add strong warnings: don’t run this with secrets on untrusted PRs, don’t use pull_request_target, pin dependencies, and prefer explicit allowlisted targets over auto-discovery for public repos.

  • bearer / --bearer support can pass credentials into remote MCP checks. Please document that bearer tokens must not be provided on untrusted PRs or auto-discovered targets, and should be scoped/ephemeral.

  • The runtime gate defaults to accepting A/B, but remote MCP servers cap at B because egress is unverified. That means the default accepts servers where network exfiltration was not tested. Please make this explicit in the Bankr execution path and require a higher/manual review bar before routing signed actions/payments through remote B servers.

  • Attestation verification needs stricter trust rules. The docs note self-published grades are forgeable, but the examples still frame readAttestation + gateDecision as enough before “pay/execute.” Please require schema ID, chain ID, revocation status, methodology version, attester allowlist or reproducible rerun, and exact fingerprint match before any Bankr action.

  • POLYGRAPH_API_URL can override the lookup endpoint. That’s useful for dev but dangerous in agent/runtime docs. Please warn not to use untrusted lookup endpoints for execution decisions, or require host allowlisting.

  • The skill grading described in CI is static-only. Static scanning skill text can miss bundled scripts, install commands, remote URLs, and runtime behavior. Please avoid implying this is equivalent to behavioral MCP grading, and require manual/security review for skills with install-time code execution or transaction instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants