yingchen-coding

yingchen-coding

Popular repositories Loading

agentic-misuse-benchmark agentic-misuse-benchmark Public

Trajectory-level benchmark for prompt injection, policy erosion, intent drift, and coordinated misuse in agentic LLM systems.

Python 1 1
agentguard agentguard Public

Security linter for AI agent definitions — catches prompt-injection and over-broad-capability holes before they ship. Deterministic, zero-dependency, CI-ready.

Python 1
when-rlhf-fails-quietly when-rlhf-fails-quietly Public

Research project evaluating silent alignment failures in LLMs under adversarial and high-stakes prompts.

Python
safety-memos safety-memos Public

Short practical memos on agent safety failures, safeguards, and evaluation design.

Python
agentic-safety-systems-whitepaper agentic-safety-systems-whitepaper Public

Whitepaper and system design for closed-loop agent safety: trajectory evals, safeguards, release gates, and incident feedback.

HTML
loopforge loopforge Public

Engineering toolkit for agent loops — lint, scaffold, run, schedule, and eval autonomous loops against a six-block model.

Python