Skip to content
Change the repository type filter

All

    Repositories list

    • BadBone

      Public
      Official repo for of the paper "BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning"
      Python
      0100Updated Apr 25, 2026Apr 25, 2026
    • The Official Repository for Paper "HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?"
      Python
      MIT License
      1700Updated Apr 20, 2026Apr 20, 2026
    • Python
      0000Updated Apr 17, 2026Apr 17, 2026
    • DE-CLIP

      Public
      The Official Repository for ACL 2026 Paper "DE-CLIP: Few-Shot Anomaly Detection via Difference-Guided Embedding Editing"
      0000Updated Apr 17, 2026Apr 17, 2026
    • This is the official repository of the ACL 2026 Findings paper: InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
      Python
      0000Updated Apr 15, 2026Apr 15, 2026
    • PeerCheck

      Public
      PeerCheck: dataset for evaluating LLM-generated academic reviews
      Apache License 2.0
      0100Updated Apr 14, 2026Apr 14, 2026
    • AP-Test

      Public
      Official repo for of the ACL 2026 paper "Peering Behind the Shield: Guardrail Identification in Large Language Models"
      Python
      MIT License
      0100Updated Apr 11, 2026Apr 11, 2026
    • UnsafeMoE

      Public
      This repository is for the paper "Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs."
      Python
      MIT License
      0300Updated Feb 10, 2026Feb 10, 2026
    • JAIL-CON

      Public
      [NeurIPS'25] Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency (https://arxiv.org/abs/2510.21189)
      Python
      Creative Commons Attribution 4.0 International
      1210Updated Jan 18, 2026Jan 18, 2026
    • Python
      Apache License 2.0
      0500Updated Jan 4, 2026Jan 4, 2026
    • Python
      0200Updated Dec 9, 2025Dec 9, 2025
    • Official Website of JADES
      SCSS
      MIT License
      0000Updated Sep 12, 2025Sep 12, 2025
    • T-GPS

      Public
      Python
      Apache License 2.0
      0300Updated Sep 7, 2025Sep 7, 2025
    • JADES

      Public
      This is the public code repository of paper 'JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring'
      0600Updated Aug 27, 2025Aug 27, 2025
    • GPTracker

      Public
      [S&P'25] GPTracker: A Large-Scale Measurement of Misused GPTs
      Python
      GNU General Public License v3.0
      11200Updated Jul 25, 2025Jul 25, 2025
    • SaferVLM

      Public
      Python
      2910Updated Jul 19, 2025Jul 19, 2025
    • Python
      88700Updated Jun 8, 2025Jun 8, 2025
    • [ACL2025] Official repository for "Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media"
      Python
      1800Updated May 29, 2025May 29, 2025
    • This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations …
      Python
      11000Updated May 21, 2025May 21, 2025
    • HateBench

      Public
      [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
      Apache License 2.0
      31400Updated Mar 1, 2025Mar 1, 2025
    • [Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
      Python
      Apache License 2.0
      1500Updated Jan 29, 2025Jan 29, 2025
    • [Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
      Python
      Apache License 2.0
      0510Updated Jan 29, 2025Jan 29, 2025
    • Apache License 2.0
      0100Updated Jan 28, 2025Jan 28, 2025
    • ModSCAN

      Public
      An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://…
      Python
      MIT License
      1300Updated Jan 8, 2025Jan 8, 2025
    • ICL-MIA

      Public
      Python
      0510Updated Dec 19, 2024Dec 19, 2024
    • Python
      0900Updated Dec 18, 2024Dec 18, 2024
    • JavaScript
      MIT License
      0810Updated Oct 30, 2024Oct 30, 2024
    • ZeroFake

      Public
      Python
      21110Updated Oct 30, 2024Oct 30, 2024
    • homepage

      Public
      JavaScript
      MIT License
      0000Updated Oct 14, 2024Oct 14, 2024
    • MIT License
      0000Updated Aug 28, 2024Aug 28, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.