Skip to content

AnthonyHerman/ai-testbed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Vulnerability Testbed

An unlabeled vulnerability testbed designed for evaluating LLM-based security analysis tools. This testbed contains realistic vulnerable code samples across multiple programming languages without explicit vulnerability markers.

Structure

The testbed uses GUID-named directories at the repository root, each containing vulnerable code samples:

/
β”œβ”€β”€ labels.json                           # Ground truth mapping
β”œβ”€β”€ <guid-1>/                            # Vulnerability sample 1
β”‚   └── file.ext
β”œβ”€β”€ <guid-2>/                            # Vulnerability sample 2
β”‚   └── file.ext
└── ...

Labels Format

The labels.json file provides ground truth for evaluation and scoring:

{
  "testbed_version": "1.0",
  "vulnerabilities": [
    {
      "guid": "unique-identifier",
      "cwe_id": "CWE-XXX",
      "cwe_name": "Vulnerability Name",
      "description": "Description of the vulnerability",
      "language": "Programming Language",
      "files": [
        {
          "filename": "vulnerable_file.ext",
          "vulnerable_lines": [
            {
              "start": 10,
              "end": 15,
              "description": "Specific vulnerability location"
            }
          ]
        }
      ]
    }
  ]
}

Included Vulnerabilities

This testbed includes diverse vulnerability types across multiple languages:

  • Python: SQL Injection (CWE-89), Path Traversal (CWE-22)
  • JavaScript: Cross-Site Scripting (CWE-79), Prototype Pollution (CWE-1321)
  • C: Buffer Overflow (CWE-120)
  • Java: XML External Entity Injection (CWE-611)
  • Go: OS Command Injection (CWE-78)
  • PHP: Remote File Inclusion (CWE-98)

Design Principles

  1. Unlabeled: Source files contain no explicit CWE markers, comments indicating vulnerabilities, or "vulnerable" in filenames
  2. Realistic: Code samples represent real-world patterns that could appear in production
  3. Generic Filenames: Files use common names (e.g., database.py, config.js) rather than vulnerable_sql.py
  4. Diverse: Multiple languages and vulnerability types to test broad detection capabilities
  5. Traceable: The labels.json file enables precise scoring and analysis

Usage

  1. Model Evaluation: Feed source files to your LLM/analysis tool without the labels
  2. Scoring: Compare detected vulnerabilities against labels.json for accuracy metrics
  3. Analysis: Use line number ranges to validate if the model identified the correct vulnerability locations

License

This testbed is provided for security research and educational purposes.

About

Testbed Concept - Early

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors