Provider-agnostic AI guardrail benchmarking tool. Tests your guardrail layer — not your model — across 11 backends against the OWASP LLM Top 10.
guardrailprobe fires 78 attack probes at your guardrail endpoints and tells you which ones let attacks through. It produces:
- Pass/fail per probe across OWASP LLM01–LLM10 and content-moderation categories
- Side-by-side comparison of multiple backends in a single run
- Signed benchmark reports (PDF with RFC 3161 timestamp, JSON, Markdown)
- Flask dashboard for ad-hoc probe runs and report browsing
No framework lock-in. No cloud account required. Just point it at an endpoint and run.
| Backend | Adapter key | Notes |
|---|---|---|
| NVIDIA NeMo Guardrails | nemo |
Requires pip install guardrailprobe[nemo] (includes nemoguardrails, langchain, langchain-openai, langchain-aws, langchain-community) |
| Guardrails AI | guardrails_ai |
Regex fallback always available; SDK optional |
| Microsoft Presidio | presidio |
Requires pip install guardrailprobe[presidio] |
| Lakera Guard | lakera |
Requires LAKERA_GUARD_API_KEY |
| OpenAI Moderation | openai_moderation |
Requires OPENAI_API_KEY |
| Azure Content Safety | azure_content_safety |
Requires AZURE_CONTENT_SAFETY_KEY + endpoint |
| Azure Prompt Shields | azure_prompt_shields |
Shares credentials with azure_content_safety — no separate key |
| AWS Bedrock Guardrails | aws_bedrock |
Requires AWS_ACCESS_KEY_ID + guardrail ID |
| Meta LlamaFirewall | llama_firewall |
Requires pip install guardrailprobe[llamafirewall] |
| LLM Guard | llm_guard |
Requires pip install guardrailprobe[llm_guard] |
| GA Guard | ga_guard |
Requires GA_GUARD_API_URL (must be https://) |
Adapters with missing credentials return SKIPPED gracefully — partial configurations run fine.
pip install guardrailprobeWith optional SDK extras:
# All extras
pip install "guardrailprobe[all]"
# Pick what you need
pip install "guardrailprobe[nemo,guardrails_ai,presidio]"Skip the spaCy model download (e.g. in CI):
GUARDRAILPROBE_SKIP_SPACY=1 pip install guardrailprobe# 1. Set up credentials — interactive wizard (or copy .env.example to .env and edit manually)
guardrailprobe init
# 2. Check which backends are ready
guardrailprobe status
# 3. Run a benchmark (current month, all configured backends)
guardrailprobe run --output-dir ./reports
# 4. Run against specific backends only
guardrailprobe run --backends lakera,openai_moderation --output-dir ./reports
# 5. Launch the dashboard
guardrailprobe dashboarddocker compose upOpen http://localhost:8080. The container starts even without a .env file — the Setup Guide card in the dashboard lists exactly which environment variables each unready adapter needs.
cp .env.example .env
# Fill in the keys for the backends you want to test, then:
docker compose upThe .env file is optional. Any variables already exported in your shell are passed through automatically via the environment: block in docker-compose.yml.
Here is the out-of-the-box status for each adapter and what you need to enable it:
| Adapter | Dependencies | What you need |
|---|---|---|
guardrails_ai |
None (regex fallback built-in) | Nothing — works without credentials |
presidio |
spaCy model (bundled in image) | Nothing — runs locally |
nemo |
nemoguardrails + LangChain stack (bundled) |
One LLM provider (priority order): AWS Bedrock (AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY, recommended — no rate limits), Ollama (OLLAMA_BASE_URL, local/offline), NEMO_OPENAI_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, AZURE_OPENAI_API_KEY, or ANTHROPIC_API_KEY. Works without any LLM key in colang pattern-matching mode. |
aws_bedrock |
boto3 SDK (bundled) |
AWS_BEDROCK_GUARDRAIL_ID, AWS_DEFAULT_REGION, AWS credentials |
lakera |
None — direct REST via httpx |
LAKERA_GUARD_API_KEY |
openai_moderation |
None — direct REST via httpx |
OPENAI_API_KEY |
azure_content_safety |
None — direct REST via httpx |
AZURE_CONTENT_SAFETY_KEY + AZURE_CONTENT_SAFETY_ENDPOINT |
azure_prompt_shields |
None — direct REST via httpx |
Same as azure_content_safety — no separate key |
ga_guard |
None — direct REST via httpx |
GA_GUARD_API_URL (must be https://) |
llama_firewall |
Volume-mounted (not in image) | See below |
llm_guard |
Volume-mounted (not in image) | See below |
LlamaFirewall runs a local ML model and is excluded from the Docker image to keep it lean.
Requirements: Python 3.10–3.12 on the host (PyTorch is a transitive dependency).
# 1. Install into ./site-packages on your host
# --ignore-installed avoids false conflicts with other packages in your host environment
python3.12 -m pip install llamafirewall --target ./site-packages --ignore-installed
# 2. Restart the container — the entrypoint detects the package automatically
docker compose upOn startup you will see:
[guardrailprobe] site-packages mounted — llama_firewall: YES llm_guard: NO
No environment variables are required. LlamaFirewall runs fully offline.
LLM Guard runs PromptInjection and Toxicity scanners locally and is also excluded from the Docker image.
Requirements: Python 3.9–3.12 on the host.
# 1. Install into ./site-packages on your host
python3.12 -m pip install llm-guard --target ./site-packages --ignore-installed
# 2. Restart the container
docker compose upOn startup you will see:
[guardrailprobe] site-packages mounted — llama_firewall: NO llm_guard: YES
No environment variables are required. LLM Guard runs fully offline.
python3.12 -m pip install llamafirewall llm-guard --target ./site-packages --ignore-installed
docker compose upThe container prints the detected status for each package at startup and skips any that are absent — no configuration required.
Point the ga_guard backend at your GA Guard (or any HTTPS guardrail) API:
| Variable | Required | Description |
|---|---|---|
GA_GUARD_API_URL |
Yes | Target endpoint — must start with https:// |
GA_GUARD_API_KEY |
No | Bearer token or API key |
GA_GUARD_AUTH_HEADER |
No | Header name for the key (default: Authorization) |
Add to .env:
GA_GUARD_API_URL=https://your-guardrail-api.example.com/check
GA_GUARD_API_KEY=your-key-here
# GA_GUARD_AUTH_HEADER=X-Api-Key # only needed if the API uses a non-standard headerdocker compose run --rm guardrailprobe \
guardrailprobe run --year 2026 --month 6 --output-dir /app/reportsReports are written to the guardrailprobe_reports named volume and also to ./docs/benchmarks on the host (via the ./docs bind mount).
The container uses network_mode: host so localhost:11434 inside the container reaches the host's Ollama process directly, without exposing Ollama to the LAN.
# Start Ollama on the host (separate terminal)
ollama serve
ollama pull llama3.2
# Enable in .env
echo "OLLAMA_BASE_URL=http://localhost:11434" >> .env
docker compose upOllama is disabled by default (OLLAMA_BASE_URL=). CPU inference with llama3.2 is too slow for NeMo's 3-call-per-probe workflow (~20 s/probe); a GPU is recommended. Without OLLAMA_BASE_URL, NeMo falls through to AWS Bedrock if credentials are present.
docker compose build --build-arg SKIP_SPACY=178 built-in attack probes across 11 categories:
| Category | OWASP ref | Probes |
|---|---|---|
| Prompt Injection | LLM01 | 7 |
| Insecure Output Handling | LLM02 | 6 |
| Training Data Poisoning | LLM03 | 5 |
| Model Denial of Service | LLM04 | 6 |
| Supply Chain Vulnerabilities | LLM05 | 5 |
| Sensitive Info Disclosure | LLM06 | 7 |
| Insecure Plugin Design | LLM07 | 6 |
| Excessive Agency | LLM08 | 6 |
| Overreliance | LLM09 | 5 |
| Model Theft | LLM10 | 5 |
| Content Moderation | CM-001–020 | 20 |
| Total | 78 |
See METHODOLOGY.md for probe design, scoring, and reproduction steps.
Each guardrailprobe run produces three artifacts in the output directory:
reports/
benchmark_2026_06.pdf # Signed PDF with RFC 3161 timestamp
benchmark_2026_06.json # Machine-readable full results
benchmark_2026_06.md # Human-readable summary
To sign reports with your own certificate:
guardrailprobe cert generate # self-signed P12 for testing
guardrailprobe cert show # inspect the active signing cert
guardrailprobe cert verify report.pdf # verify an existing reportSet GUARDRAIL_SIGNING_KEY_P12 to the path of your P12 file.
Choose either approach — both produce the same .env file.
guardrailprobe initWalks through each backend and prompts for keys. Press Enter to skip any adapter you don't have credentials for. Writes only what you enter to .env.
cp .env.example .env
# Open .env and fill in the keys for the backends you want to test| Variable | Backend |
|---|---|
LAKERA_GUARD_API_KEY |
Lakera Guard |
OPENAI_API_KEY |
OpenAI Moderation; NeMo fallback (priority 5) |
AZURE_CONTENT_SAFETY_ENDPOINT + AZURE_CONTENT_SAFETY_KEY |
Azure Content Safety and Azure Prompt Shields |
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
AWS Bedrock Guardrails; NeMo LLM via Bedrock (priority 3, recommended) |
AWS_BEDROCK_GUARDRAIL_ID + AWS_DEFAULT_REGION |
AWS Bedrock Guardrails — guardrail ID and region |
GA_GUARD_API_URL |
GA Guard / any HTTPS guardrail endpoint |
GA_GUARD_API_KEY |
GA Guard — optional API key |
GUARDRAIL_SIGNING_KEY_P12 |
PDF signing certificate path |
| Variable | Default | Description |
|---|---|---|
NEMO_OPENAI_API_KEY |
— | Dedicated OpenAI key for NeMo only (priority 1); avoids sharing with OpenAI Moderation backend |
NEMO_OPENAI_MODEL |
gpt-4o-mini |
Model when using OpenAI or OpenRouter as NeMo LLM |
NEMO_BEDROCK_MODEL |
amazon.nova-pro-v1:0 |
Bedrock model for NeMo intent classification |
OPENROUTER_API_KEY |
— | OpenRouter free-tier LLM for NeMo (priority 4, 16 req/min limit) |
OPENROUTER_MODEL |
nvidia/nemotron-3-nano-30b-a3b:free |
Model when using OpenRouter |
OLLAMA_BASE_URL |
(empty) | Local Ollama endpoint for NeMo (priority 2); set to http://localhost:11434 to enable. Requires GPU — CPU inference is too slow for NeMo's 3-call-per-probe flow |
OLLAMA_MODEL |
llama3.2 |
Ollama model name |
AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT |
— | Azure OpenAI as NeMo LLM (priority 6) |
ANTHROPIC_API_KEY |
— | Anthropic Claude as NeMo LLM via LangChain (priority 7) |
After either option, verify which backends are ready:
guardrailprobe statusfrom guardrailprobe import GuardrailBackend
from guardrailprobe.runner import RedTeamRunner
from guardrailprobe.probes import ProbeLibrary, AttackCategory
runner = RedTeamRunner()
library = ProbeLibrary()
# Run all probes against one backend
report = runner.run(GuardrailBackend.LAKERA, library.all_probes())
print(f"Pass rate: {report.pass_rate:.1%}")
# Compare backends
comparison = runner.compare_backends(
[GuardrailBackend.LAKERA, GuardrailBackend.OPENAI_MODERATION],
library.all_probes(),
)
print(f"Best overall: {comparison.best_overall}")
# Filter probes
injection_probes = library.get_by_category(AttackCategory.PROMPT_INJECTION)
critical_probes = library.get_by_severity("critical")
cm_probes = library.get_content_moderation_probes()GUARDRAILPROBE_SKIP_SPACY=1 pip install -e ".[dev]"
pytest tests/ -v
ruff check guardrailprobe/ tests/Apache-2.0 — see LICENSE.