Skip to content

github: Make addComment idempotent to prevent duplicate PR comments#225

Open
matias-christensen-skydio wants to merge 3 commits into
mainfrom
fix-duplicate-comments
Open

github: Make addComment idempotent to prevent duplicate PR comments#225
matias-christensen-skydio wants to merge 3 commits into
mainfrom
fix-duplicate-comments

Conversation

@matias-christensen-skydio

@matias-christensen-skydio matias-christensen-skydio commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fixes a bug where retrying a GraphQL mutation after a transient error (5xx or RESOURCE_LIMITS_EXCEEDED) would re-post addComment mutations that the server had already applied, producing duplicate stack outline and diff table comments on PRs.
  • Moves the retry loop from the generic graphql() method into update_pull_requests() where it has context to handle idempotency. Between retry attempts, re-queries each PR's comments and converts any already-posted new comments to updateIssueComment (edit) operations.
  • Retry coverage for all transient GitHub API failures is preserved.
  • Batches GraphQL requests to avoid RESOURCE_LIMITS_EXCEEDED on large stacks. Splits query_everything, create_pull_requests, and update_pull_requests into chunks of --github-batch-size PRs (default 5) per request.
  • Removes RETRYABLE_GRAPHQL_ERRORS since RESOURCE_LIMITS_EXCEEDED is a deterministic complexity rejection, not a transient error worth retrying.

Comment thread revup/github_utils.py Outdated
raise e
except RevupGithubException as e:
if "timeout" in e.message:
logging.warning(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this warning still relevant given the updated handling?

Comment thread revup/github_utils.py

# Before retrying, check which new comments were already posted
# and update their IDs so the next attempt edits instead of adds.
await _refresh_new_comment_ids(github_ep, prs)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you asyncio.gather this? i'm assuming delay is the delay between mutations and not queries, so we can subtract the amount of time this query takes

@jerry-skydio

Copy link
Copy Markdown
Collaborator

are RevupRequestException and RevupGithubException actually different? it seems like they should be combined (with any additional context being put into object members)

@jerry-skydio

Copy link
Copy Markdown
Collaborator

oh also you need to rebase

@matias-christensen-skydio

Copy link
Copy Markdown
Contributor Author

These two exceptions carry genuinely different data and are handled differently, so I'd prefer to keep them separate:

  • RevupRequestException represents an HTTP-layer failure (non-200 response). Fields: status, response. Raised before any GraphQL parsing happens.
  • RevupGithubException represents a GraphQL-level error on a 200 response. Fields: types, message, error_json. Raised after parsing the GraphQL error envelope.

They also map to different exit codes in __main__.py (5 vs 6). A merge would replace clean except dispatch with attribute-None checks. Happy to revisit if you see a cleaner unification.

@matias-christensen-skydio

Copy link
Copy Markdown
Contributor Author

@jerry-skydio Note that I pushed another commit to this PR that fixes the issue with RESOURCE_LIMITS_EXCEEDED. Turns out that is not a retryable error, and actually means that your GraphQL query is too complex. So this breaks that up into multiple smaller queries to stay under GitHub's limit.

@matias-christensen-skydio matias-christensen-skydio force-pushed the fix-duplicate-comments branch 2 times, most recently from 5d1e718 to d8e93b3 Compare April 21, 2026 11:25
Comment thread revup/github_real.py Outdated
oauth_token: str,
github_url: str,
proxy: Optional[str] = None,
batch_size: int = 5,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you determine the batch size? is this fixed or variable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was empirically chosen. As far as I can tell, it isn't very well documented how the "limit" of a GH GraphQL query is chosen and how we can predict if the current query will exceed it and cause a RESOURCE_LIMITS_EXCEEDED. So I tested a few different values and 5 PRs per batch seems to avoid issues in my current stack of about 20 PRs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled out the magic number to a constant and added a comment explaining where this number comes from.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do it more dynamically? ie attempt with all at once, if that fails split in half into 2 batches, etc exponentially

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm somewhat wary of hardcoding something considering github seems to be all over the place and it could get stricter or looser

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I have actually hit the RESOURCE_LIMITS_EXCEEDED again with this code after posting this. Will investigate this a bit more and see if it is possible to do this in a smarter way.

@matias-christensen-skydio matias-christensen-skydio force-pushed the fix-duplicate-comments branch 2 times, most recently from 0737ccb to 8389e82 Compare April 22, 2026 09:04
Large stacks exceed GitHub's GraphQL complexity budget when all
operations are packed into a single request. Split query_everything,
create_pull_requests, and update_pull_requests into batches of
--github-batch-size PRs (default 5) per request.

Also remove RETRYABLE_GRAPHQL_ERRORS since RESOURCE_LIMITS_EXCEEDED
is a deterministic complexity rejection, not a transient error.
…failing

Batching by PR count (--github-batch-size) does not bound the quantity that
actually matters: the server-side work a single request triggers. A batch of 5
PRs expands to a variable number of sub-mutations depending on how many are new
(each new PR adds two large addComment operations for the review-graph and
patchsets comments) plus per-PR reviewer and label mutations. A real 10-PR
aircam stack produced a 19-sub-mutation request that GitHub rejected with
RESOURCE_LIMITS_EXCEEDED, while the batch right before it (10 sub-mutations)
succeeded.

We cannot precompute whether a request will fit. There are two distinct limits:

  - The documented 500k node limit is static and computable from the first/last
    literals in the query, and we are nowhere near it (a 5-PR query is ~500
    nodes). This is not what we hit.

  - RESOURCE_LIMITS_EXCEEDED is a runtime resource budget. GitHub publishes no
    formula and no threshold for it; the docs only describe the behavior ("if a
    query consumes too many resources" it is terminated with partial results).
    The true cost depends on server-side factors we can't see: notifications and
    webhooks fired per addComment, repo size, index state, and current load. So
    there is no value of --github-batch-size, and no client-side estimate, that
    reliably stays under it.

Since the limit can't be predicted, react to it instead. On
RESOURCE_LIMITS_EXCEEDED, halve the batch and retry until it fits (or a single
PR/ref is reached, which we then surface). This discovers the limit at runtime
and backs off, rather than guessing a constant that drifts with stack shape and
server load.

The error is a *partial* success: GitHub applies some sub-mutations (including
addComments) before running out of budget. Resending the same request as-is
would re-post those comments, so _update_with_splitting first runs
_refresh_new_comment_ids to convert already-posted comments into edits, reusing
the idempotency machinery from e53869f, then splits and retries. The read-only
query path (_query_prs_with_splitting) needs no such guard.
@jerry-skydio

Copy link
Copy Markdown
Collaborator

btw i've refactored a large amount of code here and reapplied this change here #254

@matias-christensen-skydio

Copy link
Copy Markdown
Contributor Author

btw i've refactored a large amount of code here and reapplied this change here #254

So is this PR no longer needed? Should I just close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants