Skip to content

sohammmmm10/PromptShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptShield — Prompt Injection Detection Model

Author: Soham Dahivalkar
Goal: Build and publish a prompt injection detection model on HuggingFace.


What is PromptShield?

PromptShield is a fine-tuned DeBERTa-v3 model that detects prompt injection attacks in LLM applications. It classifies user inputs as either safe or injection.

Why This Matters

Prompt injection is the #1 security vulnerability in LLM applications (OWASP LLM Top 10). Attackers can:

  • Extract system prompts
  • Bypass safety guardrails
  • Manipulate AI behavior
  • Exfiltrate sensitive data

PromptShield provides a fast, lightweight defense layer for any LLM application.


Project Structure

huggingface-promptshield/
├── config.py                        # Configuration
├── step1_collect_injections.py      # Collect injection prompts
├── step2_collect_safe_prompts.py    # Collect safe prompts
├── step3_build_dataset.py           # Build balanced dataset
├── step4_upload_dataset.py          # Upload to HuggingFace
├── step5_train_model.py             # Fine-tune DeBERTa
├── step6_upload_model.py            # Upload model to HuggingFace
├── step7_test_model.py              # Test the model
├── dataset_card.md                  # HuggingFace Dataset README
├── model_card.md                    # HuggingFace Model README
├── requirements.txt                 # Dependencies
└── data/
    ├── raw/                         # Raw prompts
    └── processed/                   # Train/Val/Test splits

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run Pipeline

python step1_collect_injections.py
python step2_collect_safe_prompts.py
python step3_build_dataset.py
python step4_upload_dataset.py
python step5_train_model.py          # Needs GPU
python step6_upload_model.py
python step7_test_model.py

Published Artifacts

  • Dataset: Shomi28/prompt-injection-dataset
  • Model: Shomi28/PromptShield

License

MIT License — Soham Dahivalkar 2026

About

AI Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors