Skip to content

lingzhiyxp/PromptGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Introduction

This is the official code for paper "PromptGuard : Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models".

You could check our Project Website for more information.

We have released our pretrained model on Hugging Face. Please check out how to use it for inference.

This implementation can be regarded as an example that can be integrated into the Diffusers library.

📦 Training Dataset

You could download our training dataset from this link. The training dataset is not permitted for any commercial use.

🔧 Environments and Installation

conda create -n promptguard python=3.9
conda activate promptguard
pip install -r requirements.txt

🔧 Individual Safety Embedding Training

bash training.sh

You could modify the parameters in training.sh file. Normally, we just need to modify the coefficient, max_train_steps and the file and folder paths.

🔧 Inference

from diffusers import StableDiffusionPipeline
import torch
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

# remove the safety checker
def dummy_checker(images, **kwargs):
    return images, [False] * len(images)
pipe.safety_checker = dummy_checker

safety_embedding_list = [${embedding_path_1}, ${embedding_path_2}, ...] # the save paths of your embeddings
token1 = "<prompt_guard_1>"
token2 = "<prompt_guard_2>"
...
token_list = [token1, token2, ...] # the corresponding tokens of your embeddings

pipe.load_textual_inversion(pretrained_model_name_or_path=safe_embedding_list, token=token_list)

origin_prompt = "a photo of a dog"
prompt_with_system = origin_prompt + " " + token1 + " " + token2 + ...
image = pipe(prompt).images[0]
image.save("example.png")

To get a better balance between unsafe content moderation and benign content preservation, we recommend you to load Sexual, Political and Disturbing these three safe embeddings.

📄 Citation

If you find our paper/code/dataset helpful, please kindly consider citing this work with the following reference:

@misc{yuan2025promptguard,
  title={PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models}, 
  author={Lingzhi Yuan and Xinfeng Li and Chejian Xu and Guanhong Tao and Xiaojun Jia and Yihao Huang and Wei Dong and Yang Liu and Bo Li},
  year={2025},
  eprint={2501.03544},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2501.03544}, 
}

✨ Acknowledgement

This work is based on the amazing research works and open-source projects, thanks a lot to all the authors for sharing!

@misc{von-platen-etal-2022-diffusers,
    author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
    title = {Diffusers: State-of-the-art diffusion models},
    year = {2022},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/huggingface/diffusers}}
}
@inproceedings{QSHBZZ23,
    author = {Yiting Qu and Xinyue Shen and Xinlei He and Michael Backes and Savvas Zannettou and Yang Zhang},
    title = {{Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models}},
    booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
    publisher = {ACM},
    year = {2023}
}
@misc{gal2022textual,
      doi = {10.48550/ARXIV.2208.01618},
      url = {https://arxiv.org/abs/2208.01618},
      author = {Gal, Rinon and Alaluf, Yuval and Atzmon, Yuval and Patashnik, Or and Bermano, Amit H. and Chechik, Gal and Cohen-Or, Daniel},
      title = {An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion},
      publisher = {arXiv},
      year = {2022},
      primaryClass={cs.CV}
}

About

The official code for "PromptGuard : Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors