Add comment-efficiency metrics to perturbation benchmark by jingxuangu · Pull Request #84 · ChicagoHAI/OpenAIReview

jingxuangu · 2026-05-15T01:47:11Z

Summary

This PR adds budget-aware comment-efficiency metrics to the perturbation benchmark scoring pipeline.

The current benchmark primarily reports seeded-error recall. This is useful, but it does not distinguish concise reviewers from noisy reviewers that find the same number of injected errors by producing many more comments.

This PR preserves the existing detection semantics:

quote must match the perturbed text
explanation must match the perturbation's why_wrong

It records the first comment index that detects each perturbation and adds:

n_detected_at_1, n_detected_at_3, n_detected_at_5, n_detected_at_10
recall_at_1, recall_at_3, recall_at_5, recall_at_10
comments_per_detected_error
detected_per_comment

These are comment-efficiency metrics, not true precision metrics, because unmatched comments may still identify real non-injected issues.

Testing

python -m pytest tests/test_perturbation_score.py -q
python -m py_compile benchmarks/perturbation/score.py benchmarks/perturbation/models.py benchmarks/perturbation/generate_report.py src/reviewer/cli.py tests/test_perturbation_score.py

Both passed locally.

Notes

I attempted a full local score,report smoke run, but the checked-in perturbation configs appear to use an older pipeline schema and the repo does not include prepared/reviewed artifacts for the current unified runner. This appears unrelated to the scoring metric changes.

Add budget-aware metrics to perturbation scoring

2d2adc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comment-efficiency metrics to perturbation benchmark#84

Add comment-efficiency metrics to perturbation benchmark#84
jingxuangu wants to merge 1 commit into
ChicagoHAI:mainfrom
jingxuangu:add-comment-efficiency-metrics

jingxuangu commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jingxuangu commented May 15, 2026

Summary

Testing

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant