feat(api): detect duplicate/repeat PRs via GitHub Search API and appl…#17
Conversation
…y history penalty
There was a problem hiding this comment.
Pratrol Triage Brief: 🟢 High Confidence
Pratrol assessed this pull request across the primary risk vectors below.
| Risk Vector | Score | Tier |
|---|---|---|
| Contributor Context | 72/100 | 🟢 High |
| Logic Risk (Mistral) | 82/100 | 🟢 High |
| Overall Confidence | 78/100 | 🟢 High |
Summary: Adds functionality to detect duplicate/repeat PRs and apply history penalties.
Key Signal: The PR introduces a new history evaluation system to penalize repeat PR rejections.
Recommendation: Review the new history evaluation logic and ensure it integrates well with existing systems.
Deploying pratrol with
|
| Latest commit: |
e5162c7
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://5e8b4b14.pratrol.pages.dev |
| Branch Preview URL: | https://feat-triage-with-history.pratrol.pages.dev |
There was a problem hiding this comment.
🤖 Code Review
Assessment ⚠️
Two issues worth fixing before merge:
-
Query injection via PR title keywords — the
keywordsstring is interpolated directly into a GitHub Search query with only surrounding quotes. A title likefoo" repo:attacker/evilbreaks out of the quoted term and redirects the search to an arbitrary repo. Theextract_title_keywordsfunction strips non-alphanumeric chars, which mitigates most cases, but the resulting keyword string is still passed unsanitized into the query format string. The quotes insearch_rejected_prs_by_titleonly protect against whitespace, not against a keyword that itself contains a double-quote (possible if the split/filter logic ever changes). -
is_repeat_offenderlabel applied even whenensure_labelfails — ifensure_labelfor the repeat-offender label returns an error, the code logs a warning and continues.labels_to_applystill containsREPEAT_OFFENDER_LABEL, soadd_labelswill attempt to apply a label that may not exist, causing the GitHub API call to fail silently (only a warn log). The tier label has the same structure but was pre-existing; the repeat-offender label is created on-demand, so this race is real.
Note
Tag @mendral-app with feedback or questions. View session
| ) -> Result<RejectedPrSearchResult, GitHubError> { | ||
| let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{keywords}\""); | ||
| self.search_rejected_prs(&query).await |
There was a problem hiding this comment.
🔒 security (medium)
The keywords string is interpolated directly into the search query. Although extract_title_keywords currently strips non-alphanumeric chars, the query construction relies on surrounding double-quotes for safety. If a keyword ever contains " (e.g. after a future change to the extractor), an attacker-controlled PR title could inject arbitrary search qualifiers like repo:other/repo. Escape or validate the keywords before interpolation.
📝 Suggested change
| ) -> Result<RejectedPrSearchResult, GitHubError> { | |
| let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{keywords}\""); | |
| self.search_rejected_prs(&query).await | |
| async fn search_rejected_prs_by_title( | |
| &self, | |
| keywords: &str, | |
| owner: &str, | |
| repo: &str, | |
| ) -> Result<RejectedPrSearchResult, GitHubError> { | |
| // Ensure keywords contain only alphanumeric/space chars before embedding in query | |
| let safe_keywords: String = keywords | |
| .chars() | |
| .filter(|c| c.is_alphanumeric() || *c == ' ') | |
| .collect(); | |
| let query = format!("is:pr is:closed is:unmerged repo:{owner}/{repo} \"{safe_keywords}\""); | |
| self.search_rejected_prs(&query).await | |
| } |
| if history_result.is_repeat_offender | ||
| && let Err(error) = client | ||
| .ensure_label( | ||
| owner, | ||
| repo, | ||
| REPEAT_OFFENDER_LABEL.to_owned(), | ||
| REPEAT_OFFENDER_COLOR.to_owned(), | ||
| REPEAT_OFFENDER_DESCRIPTION.to_owned(), | ||
| ) | ||
| .await | ||
| { | ||
| warn!( | ||
| message = "Failed to ensure repeat-offender label exists.", | ||
| triage_id = %triage_id, | ||
| pr_number, | ||
| %error, | ||
| ); | ||
| } | ||
|
|
||
| if let Err(error) = client | ||
| .add_labels(owner, repo, pr_number, vec![label_name]) | ||
| .add_labels(owner, repo, pr_number, labels_to_apply) |
There was a problem hiding this comment.
🐛 bug (medium)
When ensure_label for REPEAT_OFFENDER_LABEL fails, the code logs a warning but labels_to_apply still contains the label. The subsequent add_labels call will then attempt to apply a non-existent label, causing a silent API failure. Remove the label from labels_to_apply on ensure_label error.
📝 Suggested change
| if history_result.is_repeat_offender | |
| && let Err(error) = client | |
| .ensure_label( | |
| owner, | |
| repo, | |
| REPEAT_OFFENDER_LABEL.to_owned(), | |
| REPEAT_OFFENDER_COLOR.to_owned(), | |
| REPEAT_OFFENDER_DESCRIPTION.to_owned(), | |
| ) | |
| .await | |
| { | |
| warn!( | |
| message = "Failed to ensure repeat-offender label exists.", | |
| triage_id = %triage_id, | |
| pr_number, | |
| %error, | |
| ); | |
| } | |
| if let Err(error) = client | |
| .add_labels(owner, repo, pr_number, vec![label_name]) | |
| .add_labels(owner, repo, pr_number, labels_to_apply) | |
| if history_result.is_repeat_offender { | |
| match client | |
| .ensure_label( | |
| owner, | |
| repo, | |
| REPEAT_OFFENDER_LABEL.to_owned(), | |
| REPEAT_OFFENDER_COLOR.to_owned(), | |
| REPEAT_OFFENDER_DESCRIPTION.to_owned(), | |
| ) | |
| .await | |
| { | |
| Ok(()) => {} | |
| Err(error) => { | |
| warn!( | |
| message = "Failed to ensure repeat-offender label exists.", | |
| triage_id = %triage_id, | |
| pr_number, | |
| %error, | |
| ); | |
| labels_to_apply.retain(|l| l != REPEAT_OFFENDER_LABEL); | |
| } | |
| } | |
| } |
…y history penalty
Note
Adds a history penalty system that detects duplicate/repeat PRs via GitHub Search API. Three parallel API calls fetch rejected PRs by author (repo-scoped), by title keywords, and by author globally. Results feed a penalty calculator that reduces the combined score and can apply a
patrol:repeat-offenderlabel.Written by Mendral for commit e5162c7.