Skip to content

Harden the agent gate and CI action for value-routing and untrusted CI#70

Merged
RubenSousaDinis merged 2 commits into
mainfrom
fix/skill-security-review
Jun 29, 2026
Merged

Harden the agent gate and CI action for value-routing and untrusted CI#70
RubenSousaDinis merged 2 commits into
mainfrom
fix/skill-security-review

Conversation

@RubenSousaDinis

Copy link
Copy Markdown
Member

Tightens the trust surface a behavioral grade is used through, prompted by a security review of the polygraph skill. The default decision for read-only / low-value callers is unchanged; the new rigor is opt-in.

Agent gate (packages/agent)

  • Add opt-in GateOptions, all default-off: allowedAttesters, acceptedMethodologyVersions, requireEgressVerified.
  • Export PAYMENT_PASSING ({A}) as the documented bar for signed/value actions — a remote server caps at B because its egress was never observed, so requiring a local A excludes egress-unverified grades.
  • DEFAULT_PASSING is now {A,B} (C is reserved/unassigned, so this only drops an accept for a grade that is never emitted).

Onchain read (packages/onchain)

  • Surface methodologyVersion and a derived egressVerified (decoded from the on-chain C-02 verdict), so a payment gate can distinguish an egress-verified local A from a remote/no-sandbox B.

CI action + command

  • action.yml: auto-discovery now defaults off so a public-repo gate doesn't run PR-controlled targets by default; pin the run version; document SHA-pinning the action, pull_request (not pull_request_target), and scoping bearer / api-url.
  • ci command: warns when discovery is on (discovered targets are repo-controllable and grading a server runs its code).

Docs

  • README action section + the bundled polygraph skill docs reframed: npx is a lookup that runs our CLI (pin the version), SHA-pin the action, bearer is sent to the target, raise the bar to a local A for value, a self-minted grade is forgeable (check attester/methodology or re-run), and a skill grade is a static scan — not equivalent to a behavioral server grade.

Backward-compatible (new optional surface only). No methodology/schema semantics change, so methodologyVersion is not bumped. Full suite green (522 passed). Ships in 0.20.0.

RubenSousaDinis and others added 2 commits June 29, 2026 15:46
Tighten the trust surface a behavioral grade is used through, without changing
the default decision for read-only/low-value callers.

- agent gate: add opt-in GateOptions (allowedAttesters, acceptedMethodologyVersions,
  requireEgressVerified), all default-off so the base decision is unchanged; export
  PAYMENT_PASSING ({A}) for signed/value actions; DEFAULT_PASSING is now {A,B}
  (C is reserved/unassigned, so this only drops an accept for a grade never emitted).
- onchain read: surface methodologyVersion and a derived egressVerified (from the
  on-chain C-02 verdict), so a payment gate can distinguish an egress-verified local
  A from a remote/no-sandbox B whose network behavior was never observed.
- action.yml: discovery is now opt-in (default off) so a public-repo gate doesn't run
  PR-controlled targets by default; pin the run version; document SHA-pinning the
  action, pull_request (not pull_request_target), and scoping bearer/api-url.
- ci command: warn when auto-discovery is on (targets are repo-controllable and
  grading a server runs its code).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k
…the trust bar

Align the polygraph skill docs and the README action section with the hardened
behavior:

- npx is a lookup that runs our CLI, not a "no-install" check — pin the version.
- SHA-pin the GitHub Action, trigger on pull_request not pull_request_target, and
  prefer an explicit allowlist over auto-discovery (now off by default).
- bearer is sent as an Authorization header to the target — trusted pinned remote
  only, scoped and short-lived.
- raise the bar for signed/value actions to a local A (PAYMENT_PASSING /
  requireEgressVerified); a remote B never had its egress observed.
- a self-minted grade is forgeable: also check attester + methodology version, or
  re-run the harness, before routing value.
- POLYGRAPH_API_URL enforces https; the residual risk is endpoint trust.
- a skill grade is a static scan, not equivalent to a behavioral server grade —
  install-time code or transaction instructions need manual review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k
@RubenSousaDinis RubenSousaDinis merged commit e0bcefa into main Jun 29, 2026
11 checks passed
RubenSousaDinis added a commit that referenced this pull request Jun 29, 2026
Ships the gate/action hardening from #70: opt-in GateOptions
(allowedAttesters, acceptedMethodologyVersions, requireEgressVerified),
PAYMENT_PASSING, on-chain methodologyVersion + egressVerified surfaced for
the read path, discovery off-by-default in the action, and the ci discovery
warning. Backward-compatible minor.


Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant