A Guardrails AI validator for detecting mental health crises and safety risks in LLM inputs and outputs using NOPE. Backed by NOPE's Edge-classifier /v1/evaluate API.
- Latency: ~200-500ms per call
- Cost: $0.003 per call ($1 free credit for new accounts)
- Coverage: 9 risk types, localized crisis resources
pip install nope-crisis-screenOr via Guardrails Hub:
guardrails hub install hub://nope/crisis_screenNote: This validator calls a hosted API (NOPE
/v1/evaluate) and therefore requires a NOPE API key — the same pattern as other API-backed Guardrails validators (e.g. Valid Address → Google Maps, Bespoke MiniCheck → BespokeLabs). The classifier model runs on NOPE's infrastructure, not locally.
- Python 3.9+
- A NOPE API key (get one free)
This validator fails open on transient/server-side problems — if the NOPE API is briefly unavailable, validation passes rather than blocking users, so the safety layer never becomes a denial-of-service vector. Developer-side errors fail loud, because a silently misconfigured safety layer is worse than none.
| Scenario | Behavior | Rationale |
|---|---|---|
| Network error | Pass (fail open) | Transient |
| API timeout | Pass (fail open) | Transient |
| Rate limited (429) | Pass (fail open) | Transient |
| Server error (5xx) | Pass (fail open) | Transient, server-side |
| Auth/balance error (401/402) | Raise ValueError |
Bad key or empty balance — fix it |
| Other client error (400/404/…) | Raise ValueError |
Misconfiguration (e.g. wrong NOPE_API_URL) |
import os
from guardrails import Guard
from nope_crisis_screen import CrisisScreen
# Set your API key
os.environ["NOPE_API_KEY"] = "nope_live_xxx"
# Create a guard. on_fail="noop" lets you inspect the outcome instead of raising;
# with the default on_fail, a failed validation raises ValidationError (see below).
guard = Guard().use(CrisisScreen(severity_threshold="moderate", on_fail="noop"))
# Screen user input
result = guard.validate("I've been feeling really hopeless lately")
if result.validation_passed:
print("No concerning signals detected")
else:
# Access failure details via validation_summaries
for summary in result.validation_summaries:
print(f"Failed: {summary.failure_reason}")
# Metadata includes risks, resources, rationale
print(f"Risks: {summary.metadata.get('risks')}")California SB 243 (effective Jan 2026) requires AI chatbots to detect and respond to mental health crises. This validator helps you comply.
Also relevant for:
- NY Article 47 - Mental health parity in digital services
- UK Online Safety Act - Duty of care for user safety
- EU AI Act - High-risk AI system requirements
| Risk Type | Description | Framework |
|---|---|---|
suicide |
Self-directed lethal intent | C-SSRS |
self_harm |
Non-suicidal self-injury | Clinical NSSI criteria |
self_neglect |
Self-care failure, eating disorders, substance crisis | - |
violence |
Risk of harm to others | HCR-20 |
abuse |
Physical, emotional, sexual, financial abuse | DASH |
sexual_violence |
Rape, sexual assault, coercion | - |
neglect |
Failure to care for dependents | Safeguarding frameworks |
exploitation |
Trafficking, grooming, sextortion | Trafficking indicators |
stalking |
Persistent unwanted contact, surveillance | - |
When risks are detected, the validator returns localized crisis resources (hotlines, chat services) for the user's country (set via country or per-call metadata).
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
NOPE_API_KEY env var |
Your NOPE API key |
severity_threshold |
str |
"moderate" |
Minimum severity: mild, moderate, high, critical |
risk_types |
list[str] |
All 9 types | Which risk types to check |
country |
str |
"US" |
ISO country code for localized resources |
include_resources |
bool |
True |
Include crisis resources in failure metadata |
include_recommended_reply |
bool |
False |
Attach a deterministic, resource-grounded safe reply as fix_value (no extra latency/cost). Auto-enabled by on_fail="fix" |
on_fail |
str | Callable |
None |
Guardrails on_fail action |
The severity scale is
mild → moderate → high → critical(matching the NOPE API)."low"is accepted as a deprecated alias for"mild".
from nope_crisis_screen import CrisisScreen
guard = Guard().use(CrisisScreen())
# This passes - no crisis signals
guard.validate("What's the weather like?")
# This fails - detects suicidal ideation
guard.validate("I've been thinking about ending it all")# Only check for self-directed harm
guard = Guard().use(CrisisScreen(
risk_types=["suicide", "self_harm", "self_neglect"],
severity_threshold="mild",
))Passing recent conversation context improves accuracy:
guard.validate(
"I don't know what to do anymore",
metadata={
"messages": [
{"role": "user", "content": "I've been struggling with thoughts of hurting myself"},
{"role": "assistant", "content": "I'm concerned about what you're sharing..."},
{"role": "user", "content": "I don't know what to do anymore"},
]
}
)# Get UK crisis resources
guard = Guard().use(CrisisScreen(country="GB"))
# Or override per-call
guard.validate("...", metadata={"country": "AU"})import openai
# Recommended: validate user input before LLM call
guard = Guard().use(CrisisScreen(), on="messages")
response = guard(
openai.chat.completions.create,
model="gpt-4",
messages=[{"role": "user", "content": user_message}],
)Without on="messages", the validator runs on the LLM output. This still works—we detect crisis signals in any text—but input validation is the primary use case.
All standard Guardrails on_fail actions are supported. If you don't set on_fail, the default raises ValidationError on failure (same as exception) — set on_fail="noop" if you want to inspect validation_passed/validation_summaries without raising.
| Action | Behavior | Use Case |
|---|---|---|
exception |
Raise ValidationError |
Hard stop, alert system |
noop |
Log but continue | Monitoring, analytics |
refrain |
Return None |
Silent filtering |
fix |
Replace with safe response | Auto-respond to crisis with resources |
fix_reask |
Fix then reask if needed | Fallback chain |
| Custom function | Your handler | Route to human, show resources |
When you use on_fail="fix", the validator returns a deterministic, supportive safe reply built from the matched crisis resource (e.g. the local hotline). It is intentionally not LLM-generated — a fixed, resource-grounded message has no hallucination risk, adds no latency, and costs nothing extra:
guard = Guard().use(CrisisScreen(on_fail="fix"))
result = guard.validate("I've been thinking about ending it all")
# result.validated_output contains a safe, supportive response pointing to crisis resourcesIf the API returns no resources (e.g. include_resources=False), no fix_value is produced.
def handle_crisis(value: str, fail_result):
"""Route crisis to human support."""
# Log for review
log_crisis_event(fail_result.metadata)
# Use the recommended reply if available
if fail_result.fix_value:
return fail_result.fix_value
# Or build your own response
resources = fail_result.metadata.get("resources", {})
if resources.get("primary"):
return f"I want to make sure you're okay. Here's someone who can help: {resources['primary']['phone']}"
return None
guard = Guard().use(CrisisScreen(
include_recommended_reply=True,
on_fail=handle_crisis
))| Level | Description | Example |
|---|---|---|
mild |
Minor distress, no functional impairment | Vague expressions of sadness |
moderate |
Clear concern, not immediately dangerous | Passive suicidal ideation |
high |
Serious risk requiring urgent intervention | Active ideation with method |
critical |
Life-threatening, imminent harm | Intent + plan + timeline |
See nope.net/methodology for validation methodology, risk-framework grounding, and benchmark results.
- Not predictive: Detects current signals, not future behavior
- Not diagnostic: Does not diagnose mental health conditions
- Not therapeutic: Does not provide treatment
- Not a replacement for human clinical judgment
git clone https://github.com/nope-net/guardrails-validator
cd guardrails-validator
pip install -e ".[dev]"
pytest tests/Apache 2.0