Skip to content

feat(ai): wire external HTTP guardrail providers on the input path#551

Merged
rickcrawford merged 2 commits into
mainfrom
rickcrawford/wor-1529-external-guardrails
Jun 27, 2026
Merged

feat(ai): wire external HTTP guardrail providers on the input path#551
rickcrawford merged 2 commits into
mainfrom
rickcrawford/wor-1529-external-guardrails

Conversation

@rickcrawford

Copy link
Copy Markdown
Contributor

What

Wires external HTTP guardrail providers into the AI gateway. An origin's guardrails.external list runs external guardrail services alongside the built-in checks; input-mode entries inspect the request before dispatch and block on a not-allowed verdict.

The generic adapter and LiteLLM mode mapping already existed but were never called by the runtime. This connects them and adds provider presets.

How

  • GuardrailsConfig.external: Vec<ExternalGuardrailConfig> (new, #[serde(default)], no schema change since the type is deserialize-only).
  • Provider presets (GuardrailProvider): Presidio posts {text, language} and treats a non-empty findings array as a flag; Generic/Lakera/Aporia post {input} and parse a common allowed/flagged/blocked verdict. Optional api_key on a configurable auth_header / auth_prefix (defaults to Authorization: Bearer).
  • A pure verdict_blocks() decides blocking (mode blocks AND content disallowed; logging_only never blocks); run_input_external_guardrails() evaluates the default_on input-mode entries and returns the first block.
  • Dispatch: a thin async step before the built-in pipeline extracts the request's input text (chat message text or per-surface input), runs the external input guardrails, and returns a 400 guardrail_violation on a block, matching the existing built-in guardrail block shape. Errors honor each entry's fail_open.

Scope / follow-ups

  • Input path only this PR. Output-side (post_call) external guardrails and AWS Bedrock (SigV4 ApplyGuardrail) are not yet wired and are noted as follow-ups.
  • default_on: true runs a guardrail on every request; per-request opt-in via metadata is not included.

Tests

  • Pure unit tests in sbproxy-ai: Presidio vs generic request/response shapes, verdict_blocks across modes, config parsing with provider + auth defaults (8 external-guardrail tests pass).
  • Full sbproxy-ai (962) and sbproxy-core (459) lib suites pass; clippy -D warnings and rustdoc -D warnings -D missing_docs clean; regenerated config schema is byte-identical (no schema change).

rickcrawford and others added 2 commits June 26, 2026 14:37
An AI origin's `guardrails.external` list now runs external guardrail
services (Presidio, Lakera, Aporia, or a custom endpoint) alongside the
built-in checks. Input-mode entries (pre_call / during_call) inspect the
request before dispatch and block on a not-allowed verdict; logging_only
records only, and transport or parse errors honor each entry's fail_open
flag.

Provider presets shape the request and response: Presidio posts
{text, language} and treats a non-empty findings array as a flag; the
others post {input} and parse a common allowed/flagged/blocked verdict,
with an optional API key on a configurable auth header.

The decision logic (verdict_blocks plus the provider request/response
shaping) is pure and unit-tested; the dispatch wiring is a thin async
call before the built-in pipeline. Output-side and AWS Bedrock (SigV4)
guardrails are not yet wired.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01X19S6eQzKKExZ9RUPAHuGy
…-external-guardrails

# Conflicts:
#	CHANGELOG.md
@rickcrawford rickcrawford merged commit e947e55 into main Jun 27, 2026
8 checks passed
@rickcrawford rickcrawford deleted the rickcrawford/wor-1529-external-guardrails branch June 27, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant