Popular repositories Loading
-
agentic-misuse-benchmark
agentic-misuse-benchmark PublicTrajectory-level benchmark for prompt injection, policy erosion, intent drift, and coordinated misuse in agentic LLM systems.
-
agentguard
agentguard PublicSecurity linter for AI agent definitions — catches prompt-injection and over-broad-capability holes before they ship. Deterministic, zero-dependency, CI-ready.
Python 1
-
when-rlhf-fails-quietly
when-rlhf-fails-quietly PublicResearch project evaluating silent alignment failures in LLMs under adversarial and high-stakes prompts.
Python
-
safety-memos
safety-memos PublicShort practical memos on agent safety failures, safeguards, and evaluation design.
Python
-
agentic-safety-systems-whitepaper
agentic-safety-systems-whitepaper PublicWhitepaper and system design for closed-loop agent safety: trajectory evals, safeguards, release gates, and incident feedback.
HTML
-
loopforge
loopforge PublicEngineering toolkit for agent loops — lint, scaffold, run, schedule, and eval autonomous loops against a six-block model.
Python
If the problem persists, check the GitHub status page or contact support.