DeepSource Benchmarks

Benchmark dataset evaluating code review and security analysis tools on the OpenSSF CVE Benchmark.

Benchmarked Tools

Last updated: April 12, 2026

Data Format

Judged Results (`benchmarks/judged-results/`)

Final evaluation results in JSONL format with fields:

cve_id: CVE identifier
variant: fixed or unfixed
detected_issues: Issues found by the tool
TP, FP, TN, FN: Classification metrics
judge_reasoning: Explanation of the judgment

Processed Results (`benchmarks/processed/`)

Intermediate formatted results from each tool, normalized for comparison.

Raw Output (`benchmarks/raw-output/`)

Original tool outputs per CVE, preserving the exact response from each tool.

References

OpenSSF CVE Benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
archive/sep-2025		archive/sep-2025
benchmarks		benchmarks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSource Benchmarks

Benchmarked Tools

Data Format

Judged Results (`benchmarks/judged-results/`)

Processed Results (`benchmarks/processed/`)

Raw Output (`benchmarks/raw-output/`)

Archive

References

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DeepSource Benchmarks

Benchmarked Tools

Data Format

Judged Results (benchmarks/judged-results/)

Processed Results (benchmarks/processed/)

Raw Output (benchmarks/raw-output/)

Archive

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Judged Results (`benchmarks/judged-results/`)

Processed Results (`benchmarks/processed/`)

Raw Output (`benchmarks/raw-output/`)