An unlabeled vulnerability testbed designed for evaluating LLM-based security analysis tools. This testbed contains realistic vulnerable code samples across multiple programming languages without explicit vulnerability markers.
The testbed uses GUID-named directories at the repository root, each containing vulnerable code samples:
/
βββ labels.json # Ground truth mapping
βββ <guid-1>/ # Vulnerability sample 1
β βββ file.ext
βββ <guid-2>/ # Vulnerability sample 2
β βββ file.ext
βββ ...
The labels.json file provides ground truth for evaluation and scoring:
{
"testbed_version": "1.0",
"vulnerabilities": [
{
"guid": "unique-identifier",
"cwe_id": "CWE-XXX",
"cwe_name": "Vulnerability Name",
"description": "Description of the vulnerability",
"language": "Programming Language",
"files": [
{
"filename": "vulnerable_file.ext",
"vulnerable_lines": [
{
"start": 10,
"end": 15,
"description": "Specific vulnerability location"
}
]
}
]
}
]
}This testbed includes diverse vulnerability types across multiple languages:
- Python: SQL Injection (CWE-89), Path Traversal (CWE-22)
- JavaScript: Cross-Site Scripting (CWE-79), Prototype Pollution (CWE-1321)
- C: Buffer Overflow (CWE-120)
- Java: XML External Entity Injection (CWE-611)
- Go: OS Command Injection (CWE-78)
- PHP: Remote File Inclusion (CWE-98)
- Unlabeled: Source files contain no explicit CWE markers, comments indicating vulnerabilities, or "vulnerable" in filenames
- Realistic: Code samples represent real-world patterns that could appear in production
- Generic Filenames: Files use common names (e.g.,
database.py,config.js) rather thanvulnerable_sql.py - Diverse: Multiple languages and vulnerability types to test broad detection capabilities
- Traceable: The
labels.jsonfile enables precise scoring and analysis
- Model Evaluation: Feed source files to your LLM/analysis tool without the labels
- Scoring: Compare detected vulnerabilities against
labels.jsonfor accuracy metrics - Analysis: Use line number ranges to validate if the model identified the correct vulnerability locations
This testbed is provided for security research and educational purposes.