Skip to content

yingchen-coding/safety-memos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Safety Memos

CI

Short, practical research memos on agentic safety, safeguards, and evaluation failures.

The goal is not to publish another generic AI safety blog. Each memo is written to be useful to people building or reviewing agentic systems: what fails, why the usual benchmark misses it, and what kind of engineering gate would catch it before release.

Why This Exists

Agent safety work often gets split into two weak forms:

  • high-level essays that do not tell an engineer what to test
  • benchmark reports that do not explain the underlying failure mechanism

These memos sit between the two. They turn a safety argument into a concrete design pressure for the rest of the portfolio: stress tests, regression suites, release gates, incident replay, and agent definition scanners.

Contents

How To Use

Read a memo, then follow the implementation path:

  1. Use when-rlhf-fails-quietly to name the failure mode.
  2. Use agentic-misuse-benchmark to turn it into a measurable scenario.
  3. Use safety-harness to stress-test, pin regressions, and gate releases.
  4. Use agentguard when the risk lives in agent definitions, tool grants, hooks, or commands.

Related Projects

About

Short practical memos on agent safety failures, safeguards, and evaluation design.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages