Author: Soham Dahivalkar
Goal: Build and publish a prompt injection detection model on HuggingFace.
PromptShield is a fine-tuned DeBERTa-v3 model that detects prompt injection attacks in LLM applications. It classifies user inputs as either safe or injection.
Prompt injection is the #1 security vulnerability in LLM applications (OWASP LLM Top 10). Attackers can:
- Extract system prompts
- Bypass safety guardrails
- Manipulate AI behavior
- Exfiltrate sensitive data
PromptShield provides a fast, lightweight defense layer for any LLM application.
huggingface-promptshield/
├── config.py # Configuration
├── step1_collect_injections.py # Collect injection prompts
├── step2_collect_safe_prompts.py # Collect safe prompts
├── step3_build_dataset.py # Build balanced dataset
├── step4_upload_dataset.py # Upload to HuggingFace
├── step5_train_model.py # Fine-tune DeBERTa
├── step6_upload_model.py # Upload model to HuggingFace
├── step7_test_model.py # Test the model
├── dataset_card.md # HuggingFace Dataset README
├── model_card.md # HuggingFace Model README
├── requirements.txt # Dependencies
└── data/
├── raw/ # Raw prompts
└── processed/ # Train/Val/Test splits
pip install -r requirements.txtpython step1_collect_injections.py
python step2_collect_safe_prompts.py
python step3_build_dataset.py
python step4_upload_dataset.py
python step5_train_model.py # Needs GPU
python step6_upload_model.py
python step7_test_model.py- Dataset:
Shomi28/prompt-injection-dataset - Model:
Shomi28/PromptShield
MIT License — Soham Dahivalkar 2026