This is the official code for paper "PromptGuard : Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models".
You could check our Project Website for more information.
We have released our pretrained model on Hugging Face. Please check out how to use it for inference.
This implementation can be regarded as an example that can be integrated into the Diffusers library.
You could download our training dataset from this link. The training dataset is not permitted for any commercial use.
conda create -n promptguard python=3.9
conda activate promptguard
pip install -r requirements.txtbash training.shYou could modify the parameters in training.sh file. Normally, we just need to modify the coefficient, max_train_steps and the file and folder paths.
from diffusers import StableDiffusionPipeline
import torch
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
# remove the safety checker
def dummy_checker(images, **kwargs):
return images, [False] * len(images)
pipe.safety_checker = dummy_checker
safety_embedding_list = [${embedding_path_1}, ${embedding_path_2}, ...] # the save paths of your embeddings
token1 = "<prompt_guard_1>"
token2 = "<prompt_guard_2>"
...
token_list = [token1, token2, ...] # the corresponding tokens of your embeddings
pipe.load_textual_inversion(pretrained_model_name_or_path=safe_embedding_list, token=token_list)
origin_prompt = "a photo of a dog"
prompt_with_system = origin_prompt + " " + token1 + " " + token2 + ...
image = pipe(prompt).images[0]
image.save("example.png")To get a better balance between unsafe content moderation and benign content preservation, we recommend you to load Sexual, Political and Disturbing these three safe embeddings.
If you find our paper/code/dataset helpful, please kindly consider citing this work with the following reference:
@misc{yuan2025promptguard,
title={PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models},
author={Lingzhi Yuan and Xinfeng Li and Chejian Xu and Guanhong Tao and Xiaojun Jia and Yihao Huang and Wei Dong and Yang Liu and Bo Li},
year={2025},
eprint={2501.03544},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.03544},
}
This work is based on the amazing research works and open-source projects, thanks a lot to all the authors for sharing!
@misc{von-platen-etal-2022-diffusers,
author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
title = {Diffusers: State-of-the-art diffusion models},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/diffusers}}
}@inproceedings{QSHBZZ23,
author = {Yiting Qu and Xinyue Shen and Xinlei He and Michael Backes and Savvas Zannettou and Yang Zhang},
title = {{Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models}},
booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
publisher = {ACM},
year = {2023}
}@misc{gal2022textual,
doi = {10.48550/ARXIV.2208.01618},
url = {https://arxiv.org/abs/2208.01618},
author = {Gal, Rinon and Alaluf, Yuval and Atzmon, Yuval and Patashnik, Or and Bermano, Amit H. and Chechik, Gal and Cohen-Or, Daniel},
title = {An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion},
publisher = {arXiv},
year = {2022},
primaryClass={cs.CV}
}