Skip to content

feat(api): detect duplicate/repeat PRs via GitHub Search API and appl…#17

Merged
ljahier merged 1 commit into
mainfrom
feat/triage_with_history
Mar 2, 2026
Merged

feat(api): detect duplicate/repeat PRs via GitHub Search API and appl…#17
ljahier merged 1 commit into
mainfrom
feat/triage_with_history

Conversation

@ljahier
Copy link
Copy Markdown
Member

@ljahier ljahier commented Mar 2, 2026

…y history penalty


Note

Adds a history penalty system that detects duplicate/repeat PRs via GitHub Search API. Three parallel API calls fetch rejected PRs by author (repo-scoped), by title keywords, and by author globally. Results feed a penalty calculator that reduces the combined score and can apply a patrol:repeat-offender label.

Written by Mendral for commit e5162c7.

Copy link
Copy Markdown

@pratrol pratrol Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pratrol Triage Brief: 🟢 High Confidence

Pratrol assessed this pull request across the primary risk vectors below.

Risk Vector Score Tier
Contributor Context 72/100 🟢 High
Logic Risk (Mistral) 82/100 🟢 High
Overall Confidence 78/100 🟢 High

Summary: Adds functionality to detect duplicate/repeat PRs and apply history penalties.

Key Signal: The PR introduces a new history evaluation system to penalize repeat PR rejections.

Recommendation: Review the new history evaluation logic and ensure it integrates well with existing systems.

@pratrol pratrol Bot added the patrol:trusted PR author and content look trustworthy label Mar 2, 2026
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying pratrol with  Cloudflare Pages  Cloudflare Pages

Latest commit: e5162c7
Status: ✅  Deploy successful!
Preview URL: https://5e8b4b14.pratrol.pages.dev
Branch Preview URL: https://feat-triage-with-history.pratrol.pages.dev

View logs

@ljahier ljahier merged commit 271f411 into main Mar 2, 2026
6 checks passed
@ljahier ljahier deleted the feat/triage_with_history branch March 2, 2026 02:28
Copy link
Copy Markdown

@mendral-app mendral-app Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Code Review

Assessment ⚠️

Two issues worth fixing before merge:

  1. Query injection via PR title keywords — the keywords string is interpolated directly into a GitHub Search query with only surrounding quotes. A title like foo" repo:attacker/evil breaks out of the quoted term and redirects the search to an arbitrary repo. The extract_title_keywords function strips non-alphanumeric chars, which mitigates most cases, but the resulting keyword string is still passed unsanitized into the query format string. The quotes in search_rejected_prs_by_title only protect against whitespace, not against a keyword that itself contains a double-quote (possible if the split/filter logic ever changes).

  2. is_repeat_offender label applied even when ensure_label fails — if ensure_label for the repeat-offender label returns an error, the code logs a warning and continues. labels_to_apply still contains REPEAT_OFFENDER_LABEL, so add_labels will attempt to apply a label that may not exist, causing the GitHub API call to fail silently (only a warn log). The tier label has the same structure but was pre-existing; the repeat-offender label is created on-demand, so this race is real.

Note

Tag @mendral-app with feedback or questions. View session

Comment on lines +258 to +260
) -> Result<RejectedPrSearchResult, GitHubError> {
let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{keywords}\"");
self.search_rejected_prs(&query).await
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 security (medium)

The keywords string is interpolated directly into the search query. Although extract_title_keywords currently strips non-alphanumeric chars, the query construction relies on surrounding double-quotes for safety. If a keyword ever contains " (e.g. after a future change to the extractor), an attacker-controlled PR title could inject arbitrary search qualifiers like repo:other/repo. Escape or validate the keywords before interpolation.

📝 Suggested change
Suggested change
) -> Result<RejectedPrSearchResult, GitHubError> {
let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{keywords}\"");
self.search_rejected_prs(&query).await
async fn search_rejected_prs_by_title(
&self,
keywords: &str,
owner: &str,
repo: &str,
) -> Result<RejectedPrSearchResult, GitHubError> {
// Ensure keywords contain only alphanumeric/space chars before embedding in query
let safe_keywords: String = keywords
.chars()
.filter(|c| c.is_alphanumeric() || *c == ' ')
.collect();
let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{safe_keywords}\"");
self.search_rejected_prs(&query).await
}

Comment on lines +194 to +214
if history_result.is_repeat_offender
&& let Err(error) = client
.ensure_label(
owner,
repo,
REPEAT_OFFENDER_LABEL.to_owned(),
REPEAT_OFFENDER_COLOR.to_owned(),
REPEAT_OFFENDER_DESCRIPTION.to_owned(),
)
.await
{
warn!(
message = "Failed to ensure repeat-offender label exists.",
triage_id = %triage_id,
pr_number,
%error,
);
}

if let Err(error) = client
.add_labels(owner, repo, pr_number, vec![label_name])
.add_labels(owner, repo, pr_number, labels_to_apply)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 bug (medium)

When ensure_label for REPEAT_OFFENDER_LABEL fails, the code logs a warning but labels_to_apply still contains the label. The subsequent add_labels call will then attempt to apply a non-existent label, causing a silent API failure. Remove the label from labels_to_apply on ensure_label error.

📝 Suggested change
Suggested change
if history_result.is_repeat_offender
&& let Err(error) = client
.ensure_label(
owner,
repo,
REPEAT_OFFENDER_LABEL.to_owned(),
REPEAT_OFFENDER_COLOR.to_owned(),
REPEAT_OFFENDER_DESCRIPTION.to_owned(),
)
.await
{
warn!(
message = "Failed to ensure repeat-offender label exists.",
triage_id = %triage_id,
pr_number,
%error,
);
}
if let Err(error) = client
.add_labels(owner, repo, pr_number, vec![label_name])
.add_labels(owner, repo, pr_number, labels_to_apply)
if history_result.is_repeat_offender {
match client
.ensure_label(
owner,
repo,
REPEAT_OFFENDER_LABEL.to_owned(),
REPEAT_OFFENDER_COLOR.to_owned(),
REPEAT_OFFENDER_DESCRIPTION.to_owned(),
)
.await
{
Ok(()) => {}
Err(error) => {
warn!(
message = "Failed to ensure repeat-offender label exists.",
triage_id = %triage_id,
pr_number,
%error,
);
labels_to_apply.retain(|l| l != REPEAT_OFFENDER_LABEL);
}
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patrol:trusted PR author and content look trustworthy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant