Safety Memos

Short, practical research memos on agentic safety, safeguards, and evaluation failures.

The goal is not to publish another generic AI safety blog. Each memo is written to be useful to people building or reviewing agentic systems: what fails, why the usual benchmark misses it, and what kind of engineering gate would catch it before release.

Why This Exists

Agent safety work often gets split into two weak forms:

high-level essays that do not tell an engineer what to test
benchmark reports that do not explain the underlying failure mechanism

These memos sit between the two. They turn a safety argument into a concrete design pressure for the rest of the portfolio: stress tests, regression suites, release gates, incident replay, and agent definition scanners.

How To Use

Read a memo, then follow the implementation path:

Use when-rlhf-fails-quietly to name the failure mode.
Use agentic-misuse-benchmark to turn it into a measurable scenario.
Use safety-harness to stress-test, pin regressions, and gate releases.
Use agentguard when the risk lives in agent definitions, tool grants, hooks, or commands.

Related Projects

when-rlhf-fails-quietly — Evaluating silent alignment failures
agentic-misuse-benchmark — Multi-turn misuse detection benchmark
safety-harness — Closed-loop runtime safety harness: stress-testing, regression suite, release gate, simulator, and incident lab in one system

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
executive_summary.md		executive_summary.md
index.md		index.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Safety Memos

Why This Exists

Contents

How To Use

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Safety Memos

Why This Exists

Contents

How To Use

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages