AI Vulnerability Testbed

An unlabeled vulnerability testbed designed for evaluating LLM-based security analysis tools. This testbed contains realistic vulnerable code samples across multiple programming languages without explicit vulnerability markers.

Structure

The testbed uses GUID-named directories at the repository root, each containing vulnerable code samples:

/
├── labels.json                           # Ground truth mapping
├── <guid-1>/                            # Vulnerability sample 1
│   └── file.ext
├── <guid-2>/                            # Vulnerability sample 2
│   └── file.ext
└── ...

Labels Format

The labels.json file provides ground truth for evaluation and scoring:

{
  "testbed_version": "1.0",
  "vulnerabilities": [
    {
      "guid": "unique-identifier",
      "cwe_id": "CWE-XXX",
      "cwe_name": "Vulnerability Name",
      "description": "Description of the vulnerability",
      "language": "Programming Language",
      "files": [
        {
          "filename": "vulnerable_file.ext",
          "vulnerable_lines": [
            {
              "start": 10,
              "end": 15,
              "description": "Specific vulnerability location"
            }
          ]
        }
      ]
    }
  ]
}

Included Vulnerabilities

This testbed includes diverse vulnerability types across multiple languages:

Python: SQL Injection (CWE-89), Path Traversal (CWE-22)
JavaScript: Cross-Site Scripting (CWE-79), Prototype Pollution (CWE-1321)
C: Buffer Overflow (CWE-120)
Java: XML External Entity Injection (CWE-611)
Go: OS Command Injection (CWE-78)
PHP: Remote File Inclusion (CWE-98)

Design Principles

Unlabeled: Source files contain no explicit CWE markers, comments indicating vulnerabilities, or "vulnerable" in filenames
Realistic: Code samples represent real-world patterns that could appear in production
Generic Filenames: Files use common names (e.g., database.py, config.js) rather than vulnerable_sql.py
Diverse: Multiple languages and vulnerability types to test broad detection capabilities
Traceable: The labels.json file enables precise scoring and analysis

Usage

Model Evaluation: Feed source files to your LLM/analysis tool without the labels
Scoring: Compare detected vulnerabilities against labels.json for accuracy metrics
Analysis: Use line number ranges to validate if the model identified the correct vulnerability locations

License

This testbed is provided for security research and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
15d59f16-475a-4bbb-a8cb-0686215534cb		15d59f16-475a-4bbb-a8cb-0686215534cb
393285ff-0f43-4ae4-8693-227bde1b4a06		393285ff-0f43-4ae4-8693-227bde1b4a06
64a50d2c-0ae9-4586-8416-24f474b939fa		64a50d2c-0ae9-4586-8416-24f474b939fa
7ddb82b2-34f4-4c7d-88b5-2c0642ed61be		7ddb82b2-34f4-4c7d-88b5-2c0642ed61be
8bd86015-5f8f-4235-895b-ff3b37e37180		8bd86015-5f8f-4235-895b-ff3b37e37180
ac51753c-5aa4-4673-ae25-c007c3033db4		ac51753c-5aa4-4673-ae25-c007c3033db4
bb4dc83b-3500-436a-aaef-f930852509e6		bb4dc83b-3500-436a-aaef-f930852509e6
cff5d086-1be3-498a-b7d8-164b42ae115e		cff5d086-1be3-498a-b7d8-164b42ae115e
root/pkg/mod/cache		root/pkg/mod/cache
README.md		README.md
_codeql_detected_source_root		_codeql_detected_source_root
labels.json		labels.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Vulnerability Testbed

Structure

Labels Format

Included Vulnerabilities

Design Principles

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Vulnerability Testbed

Structure

Labels Format

Included Vulnerabilities

Design Principles

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages