PromptShield — Prompt Injection Detection Model

Author: Soham Dahivalkar
Goal: Build and publish a prompt injection detection model on HuggingFace.

What is PromptShield?

PromptShield is a fine-tuned DeBERTa-v3 model that detects prompt injection attacks in LLM applications. It classifies user inputs as either safe or injection.

Why This Matters

Prompt injection is the #1 security vulnerability in LLM applications (OWASP LLM Top 10). Attackers can:

Extract system prompts
Bypass safety guardrails
Manipulate AI behavior
Exfiltrate sensitive data

PromptShield provides a fast, lightweight defense layer for any LLM application.

Project Structure

huggingface-promptshield/
├── config.py                        # Configuration
├── step1_collect_injections.py      # Collect injection prompts
├── step2_collect_safe_prompts.py    # Collect safe prompts
├── step3_build_dataset.py           # Build balanced dataset
├── step4_upload_dataset.py          # Upload to HuggingFace
├── step5_train_model.py             # Fine-tune DeBERTa
├── step6_upload_model.py            # Upload model to HuggingFace
├── step7_test_model.py              # Test the model
├── dataset_card.md                  # HuggingFace Dataset README
├── model_card.md                    # HuggingFace Model README
├── requirements.txt                 # Dependencies
└── data/
    ├── raw/                         # Raw prompts
    └── processed/                   # Train/Val/Test splits

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run Pipeline

python step1_collect_injections.py
python step2_collect_safe_prompts.py
python step3_build_dataset.py
python step4_upload_dataset.py
python step5_train_model.py          # Needs GPU
python step6_upload_model.py
python step7_test_model.py

Published Artifacts

Dataset: Shomi28/prompt-injection-dataset
Model: Shomi28/PromptShield

License

MIT License — Soham Dahivalkar 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptShield — Prompt Injection Detection Model

What is PromptShield?

Why This Matters

Project Structure

Quick Start

1. Install Dependencies

2. Run Pipeline

Published Artifacts

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
store-assets		store-assets
.gitignore		.gitignore
PromptShield_Colab.ipynb		PromptShield_Colab.ipynb
README.md		README.md
check_notebook.py		check_notebook.py
colab_fast_train.py		colab_fast_train.py
colab_notebook.py		colab_notebook.py
config.py		config.py
dataset_card.md		dataset_card.md
model_card.md		model_card.md
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
step1_collect_injections.py		step1_collect_injections.py
step2_collect_safe_prompts.py		step2_collect_safe_prompts.py
step3_build_dataset.py		step3_build_dataset.py
step4_upload_dataset.py		step4_upload_dataset.py
step5_train_model.py		step5_train_model.py
step6_upload_model.py		step6_upload_model.py
step7_test_model.py		step7_test_model.py
verify_dataset.py		verify_dataset.py

Folders and files

Latest commit

History

Repository files navigation

PromptShield — Prompt Injection Detection Model

What is PromptShield?

Why This Matters

Project Structure

Quick Start

1. Install Dependencies

2. Run Pipeline

Published Artifacts

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages