TRUSS is an AI safety verification prototype built around one core idea:
do not trust model output by default.
Instead of sending generated code or queries straight into the real world, TRUSS intercepts candidate output, detects its domain, synthesizes safety rules for that domain, parses the candidate into a structured form, evaluates explicit constraints, and returns an explainable verdict before execution.
The current implementation is a modern interactive prototype for demonstrating that verification layer across shell, sql, and python scenarios.
TRUSS turns the usual flow:
User -> Model -> Output
into:
User -> Model -> Intercept -> Detect -> Synthesize -> Parse -> Verify -> Accept / Reject / Refine
For each candidate, the system:
- classifies the likely domain
- generates a domain-aware safety grammar
- tokenizes and parses the output into a lightweight structural artifact
- evaluates transparent constraint formulas
- produces an explainable verification report
- suggests a safer refinement when the candidate fails
Most AI safety tooling still depends on trust-heavy techniques such as static blocklists, heuristic filters, or moderation checks layered after generation.
TRUSS is trying to show a different product direction:
- outputs should be intercepted before execution
- safety policy should be visible and inspectable
- verification should produce a trace, not just a pass/fail badge
- unsafe outputs should trigger refinement rather than silent failure
This repo is intentionally focused on a strong demoable MVP.
Supported domains:
shellsqlpython- fallback
general
Supported outcomes:
acceptrejectrefine
Built-in stress-test scenarios include:
- destructive bash execution
- obfuscated Python deletion attempts
- mixed safe/unsafe SQL
- remote shell bootstrap pipelines
- safe analytics SQL
The UI is designed as a transparent verification workspace rather than a chatbot wrapper.
It exposes:
- intercepted user prompt and candidate output
- domain detection and confidence
- generated safety formulas
- grammar matrix
- token ledger
- lightweight AST view
- violation timeline
- proof artifact metadata
- raw proof JSON
- auto-refinement loop
- Next.js 16 App Router
- React 19
- TypeScript 5
- Tailwind CSS 4
- Route Handlers for the verification API
Core flow:
-
Domain detectionDetects whether the candidate is likelyshell,sql,python, orgeneral. -
Grammar synthesisProduces allowed, blocked, and required constructs for the detected domain. -
ParsingBuilds a deterministic structured artifact from the candidate output. -
Constraint evaluationApplies explicit safety checks and records evidence for failures. -
Verdict generationReturnsaccept,reject, orrefinewith a complete trace. -
RefinementProduces a safer candidate when rejection or repair is appropriate.
src/
app/
api/verify/route.ts # Verification API
globals.css # Theme tokens, typography, motion, UI utilities
layout.tsx # App metadata and root shell
page.tsx # Top-level page composition
components/
theme-toggle-button.tsx # Theme toggle control
verification-workspace.tsx # Main product UI
lib/
safety/
domain.ts # Domain detection logic
index.ts # Public safety exports
parser.ts # Lightweight parser + structural artifacts
policy.ts # Grammar / policy synthesis
scenarios.ts # Built-in demo scenarios
types.ts # Shared safety and report types
verifier.ts # Decision engine and verification report builder
public/
truss-mark.svg # Shared TRUSS brand mark / favicon asset
Install dependencies:
npm installStart the dev server:
npm run devOpen http://localhost:3000.
Useful commands:
npm run lint
npm run build
npm run startPOST /api/verify
Request body:
{
"userPrompt": "Generate SQL for active employees in engineering.",
"modelOutput": "SELECT employee_id, full_name FROM employees WHERE department = 'Engineering';",
"mode": "strict"
}mode currently supports:
strictbalanced
Example response shape:
{
"verdict": "accept",
"summary": "Verified as a sql output with no blocking constraint failures.",
"nextStep": "Allow candidate to proceed.",
"domain": "sql",
"confidence": 0.96,
"riskLevel": "medium",
"mode": "strict",
"grammar": {
"summary": "Read-only SQL grammar with bounded clauses.",
"allowedConstructs": ["SELECT", "FROM", "WHERE", "ORDER BY"],
"blockedConstructs": ["DROP", "DELETE", "TRUNCATE", "UPDATE", "INSERT"],
"requiredProperties": ["single statement", "read-only intent"]
},
"constraints": [],
"violationTrace": [],
"proofArtifact": {
"schema": "truss.proof.v1",
"checksum": "sha256:...",
"tokenCount": 14,
"astNodeCount": 6,
"failingConstraintIds": [],
"generatedAt": "2026-05-21T00:00:00.000Z",
"domain": "sql"
}
}The current interface uses:
- a warm editorial dark mode instead of pure black
- a restrained palette based on paper, charcoal, dusty coral, and muted blue
- motion for state transitions and verification flow emphasis
- deterministic demo scenarios so the product can be evaluated without relying on unpredictable live model output
This is a transparent prototype, not a finished formal methods backend.
Right now TRUSS uses:
- lightweight deterministic parsing
- explicit heuristics and rule checks
- structured proof-like artifacts for explainability
It does not yet use:
- full language parsers for every domain
- an SMT solver or theorem prover
- persistent audit storage
- multi-tenant policy management
- live LLM orchestration by default
High-value next steps:
- Replace lightweight parsing with AST-backed parsers for
shell,sql, andpython. - Add policy packs for organizations, environments, and risk profiles.
- Persist verification events for governance, audit, and replay.
- Add live model integration behind the current deterministic scenario mode.
- Introduce a formal solver-backed verification path for high-assurance cases.
- Add auth, workspaces, and policy administration for enterprise use.
If you need a short explanation for demos or submissions:
TRUSS is an AI safety verification layer that dynamically synthesizes safety rules, parses AI-generated outputs into structured artifacts, and verifies them before they are accepted or executed.