Popular repositories Loading
-
Sycophancy_Emergent_Misalignment_and_Gated_attention_FT
Sycophancy_Emergent_Misalignment_and_Gated_attention_FT PublicThis repository provides the implementation for studying Sycophancy-Induced Emerging Misalignment (EM) and introduces Gated Finetuning, a mechanism that enables training-free and instant reversal o…
Python 6
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.