📚 Introduction

This is the official code for paper "PromptGuard : Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models".

You could check our Project Website for more information.

We have released our pretrained model on Hugging Face. Please check out how to use it for inference.

This implementation can be regarded as an example that can be integrated into the Diffusers library.

📦 Training Dataset

You could download our training dataset from this link. The training dataset is not permitted for any commercial use.

🔧 Environments and Installation

conda create -n promptguard python=3.9
conda activate promptguard
pip install -r requirements.txt

🔧 Individual Safety Embedding Training

bash training.sh

You could modify the parameters in training.sh file. Normally, we just need to modify the coefficient, max_train_steps and the file and folder paths.

🔧 Inference

from diffusers import StableDiffusionPipeline
import torch
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

# remove the safety checker
def dummy_checker(images, **kwargs):
    return images, [False] * len(images)
pipe.safety_checker = dummy_checker

safety_embedding_list = [${embedding_path_1}, ${embedding_path_2}, ...] # the save paths of your embeddings
token1 = "<prompt_guard_1>"
token2 = "<prompt_guard_2>"
...
token_list = [token1, token2, ...] # the corresponding tokens of your embeddings

pipe.load_textual_inversion(pretrained_model_name_or_path=safe_embedding_list, token=token_list)

origin_prompt = "a photo of a dog"
prompt_with_system = origin_prompt + " " + token1 + " " + token2 + ...
image = pipe(prompt).images[0]
image.save("example.png")

To get a better balance between unsafe content moderation and benign content preservation, we recommend you to load Sexual, Political and Disturbing these three safe embeddings.

📄 Citation

If you find our paper/code/dataset helpful, please kindly consider citing this work with the following reference:

@misc{yuan2025promptguard,
  title={PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models}, 
  author={Lingzhi Yuan and Xinfeng Li and Chejian Xu and Guanhong Tao and Xiaojun Jia and Yihao Huang and Wei Dong and Yang Liu and Bo Li},
  year={2025},
  eprint={2501.03544},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2501.03544}, 
}

✨ Acknowledgement

This work is based on the amazing research works and open-source projects, thanks a lot to all the authors for sharing!

🤗 Diffusers

@misc{von-platen-etal-2022-diffusers,
    author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
    title = {Diffusers: State-of-the-art diffusion models},
    year = {2022},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/huggingface/diffusers}}
}

Unsafe Diffusion

@inproceedings{QSHBZZ23,
    author = {Yiting Qu and Xinyue Shen and Xinlei He and Michael Backes and Savvas Zannettou and Yang Zhang},
    title = {{Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models}},
    booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
    publisher = {ACM},
    year = {2023}
}

Textual Inversion

@misc{gal2022textual,
      doi = {10.48550/ARXIV.2208.01618},
      url = {https://arxiv.org/abs/2208.01618},
      author = {Gal, Rinon and Alaluf, Yuval and Atzmon, Yuval and Patashnik, Or and Bermano, Amit H. and Chechik, Gal and Cohen-Or, Daniel},
      title = {An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion},
      publisher = {arXiv},
      year = {2022},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
safe_embeddings/model_weights		safe_embeddings/model_weights
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
training.sh		training.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Introduction

📦 Training Dataset

🔧 Environments and Installation

🔧 Individual Safety Embedding Training

🔧 Inference

📄 Citation

✨ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 Introduction

📦 Training Dataset

🔧 Environments and Installation

🔧 Individual Safety Embedding Training

🔧 Inference

📄 Citation

✨ Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages