Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset

## Proposal

Add an "OWASP LLM02 (Insecure Output Handling)" scorer pack to PyRIT — 4 new `TrueFalseScorer` subclasses for output-side threat detection, plus an optional companion seed dataset.

This builds directly on the architecture introduced in #1704 (the new `RegexScorer` base class + `CredentialLeakScorer` subclass pattern). The proposal extends that pattern across the remaining OWASP LLM02 sub-categories.

## Motivation

PyRIT has strong coverage of **LLM01** (Prompt Injection — input side): `SelfAsk*Scorer`s, `PromptShieldScorer`, `GandalfScorer`, garak scenarios, ATR via #1715, etc.

**LLM02** (Insecure Output Handling — output side) is comparatively under-instrumented:
- ✅ `MarkdownInjectionScorer` exists (markdown smuggling)
- ✅ `RegexScorer` + `CredentialLeakScorer` shipping in #1704 (credential leak)
- 🔄 #513 has an in-flight static-analysis insecure-code follow-up (@precognitivem0nk's 2026-05-01 plan)
- ❌ **XSS** payload emission — no dedicated scorer
- ❌ **SQL injection** payload emission — no dedicated scorer
- ❌ **Shell command injection** payload emission — no dedicated scorer
- ❌ **Path traversal** payload emission — no dedicated scorer

These four categories appear directly in OWASP's LLM02 sub-class list. Each is fast and deterministic to detect via the same `RegexScorer` pattern.

## Proposed scope

Four new `RegexScorer` subclasses in `pyrit/score/true_false/`:

| Scorer | Purpose | Default patterns |
|---|---|---:|
| `XSSOutputScorer` | LLM emits an XSS payload (script-tag, event-handler, javascript: URI, data:text/html, iframe srcdoc, svg+script) | 6 |
| `SQLInjectionOutputScorer` | LLM emits a destructive / union / comment-bypass SQL payload | 3 |
| `ShellCommandOutputScorer` | LLM emits a pipe-to-shell, destructive, reverse-shell, or env-exfil shell payload | 4 |
| `PathTraversalOutputScorer` | LLM emits `../../etc/passwd`-style path traversal to known-sensitive targets | 1 (dual-condition) |

Each follows the exact pattern `CredentialLeakScorer` uses: subclass `RegexScorer`, set `_DEFAULT_PATTERNS`, pass `categories=["security", "owasp-llm02-<category>"]` to `super().__init__()`. ~50-60 lines per scorer including docstrings.

**Companion seed dataset** (optional): `OWASPLLM02SeedDataset` — 33 hand-curated developer-style adversarial requests (29 adversarial + 4 benign controls) calibrated to elicit each of the above payloads. Useful for red-team eval scenarios that want the inputs as well as the scorers.

## Out of scope (explicitly to avoid duplicating ongoing work)

- ❌ Credential / API-key / private-key detection → covered by #1704.
- ❌ Markdown smuggling → already covered by existing `MarkdownInjectionScorer`.
- ❌ Insecure code generation (semgrep + regex) → @precognitivem0nk's plan on #513.
- ❌ Dataset loaders for third-party threat-rule packs → covered by PR #1715.

## Disclosure

The proposed patterns are derived from a published MIT-licensed regex catalog I (@ppcvote) maintain:
- [`prompt-defense-audit`](https://github.com/ppcvote/prompt-defense-audit) (npm, TypeScript)
- [`prompt-defense-audit-py`](https://github.com/ppcvote/prompt-defense-audit-py) (PyPI, byte-parity Python port)
- [`prompt-defense-eval`](https://github.com/ppcvote/prompt-defense-eval) (Inspect AI eval task, awaiting registry merge at UKGovernmentBEIS/inspect_evals#1659)

A companion arXiv preprint is in draft (cs.CR target, ~17K words, 8-model × 1,584-generation evaluation, with the regex taxonomy as the core methodology). Happy to share full text if useful for review context.

**Surfacing the "author also contributor" dynamic up front**: if maintainers would rather write the patterns yourselves using our work only as reference, that's a totally acceptable outcome — please say the word.

## Questions before any PR

1. **4 scorers or 1?** Would you prefer 4 separate `RegexScorer` subclasses (mirroring `CredentialLeakScorer`), or a single `OWASPLLM02Scorer` parameterized by category list?
2. **Dependency on #1704?** Should we wait for the `RegexScorer` base class to merge before opening dependent PRs, or is a stacked PR acceptable?
3. **Seed dataset placement?** GitHub-hosted (per @romanlutz's note in #1702) or HuggingFace-hosted?
4. **Scenario or atomic?** Do you want a `OWASPLLM02RedTeam` scenario bundling these + existing scorers, or keep scorers atomic for scenario-author composition?

## What we're not asking

Not asking you to merge anything blindly. Not asking to skip review. Happy to iterate on scope, drop any of the four scorers, or pause entirely if this conflicts with anything I've missed.

cc @rlundeen2 @romanlutz (per recent activity on #1703 / #1702 / #513)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset #1737

Proposal

Motivation

Proposed scope

Out of scope (explicitly to avoid duplicating ongoing work)

Disclosure

Questions before any PR

What we're not asking

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scorer	Purpose	Default patterns
`XSSOutputScorer`	LLM emits an XSS payload (script-tag, event-handler, javascript: URI, data:text/html, iframe srcdoc, svg+script)	6
`SQLInjectionOutputScorer`	LLM emits a destructive / union / comment-bypass SQL payload	3
`ShellCommandOutputScorer`	LLM emits a pipe-to-shell, destructive, reverse-shell, or env-exfil shell payload	4
`PathTraversalOutputScorer`	LLM emits `../../etc/passwd`-style path traversal to known-sensitive targets	1 (dual-condition)

Proposal: OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) + companion seed dataset #1737

Description

Proposal

Motivation

Proposed scope

Out of scope (explicitly to avoid duplicating ongoing work)

Disclosure

Questions before any PR

What we're not asking

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions