github: Make addComment idempotent to prevent duplicate PR comments#225
github: Make addComment idempotent to prevent duplicate PR comments#225matias-christensen-skydio wants to merge 3 commits into
Conversation
c9bd434 to
da62129
Compare
| raise e | ||
| except RevupGithubException as e: | ||
| if "timeout" in e.message: | ||
| logging.warning( |
There was a problem hiding this comment.
is this warning still relevant given the updated handling?
|
|
||
| # Before retrying, check which new comments were already posted | ||
| # and update their IDs so the next attempt edits instead of adds. | ||
| await _refresh_new_comment_ids(github_ep, prs) |
There was a problem hiding this comment.
can you asyncio.gather this? i'm assuming delay is the delay between mutations and not queries, so we can subtract the amount of time this query takes
|
are RevupRequestException and RevupGithubException actually different? it seems like they should be combined (with any additional context being put into object members) |
|
oh also you need to rebase |
da62129 to
5803d5a
Compare
|
These two exceptions carry genuinely different data and are handled differently, so I'd prefer to keep them separate:
They also map to different exit codes in |
5803d5a to
e53869f
Compare
|
@jerry-skydio Note that I pushed another commit to this PR that fixes the issue with |
5d1e718 to
d8e93b3
Compare
| oauth_token: str, | ||
| github_url: str, | ||
| proxy: Optional[str] = None, | ||
| batch_size: int = 5, |
There was a problem hiding this comment.
how did you determine the batch size? is this fixed or variable?
There was a problem hiding this comment.
This was empirically chosen. As far as I can tell, it isn't very well documented how the "limit" of a GH GraphQL query is chosen and how we can predict if the current query will exceed it and cause a RESOURCE_LIMITS_EXCEEDED. So I tested a few different values and 5 PRs per batch seems to avoid issues in my current stack of about 20 PRs.
There was a problem hiding this comment.
Pulled out the magic number to a constant and added a comment explaining where this number comes from.
There was a problem hiding this comment.
can we do it more dynamically? ie attempt with all at once, if that fails split in half into 2 batches, etc exponentially
There was a problem hiding this comment.
i'm somewhat wary of hardcoding something considering github seems to be all over the place and it could get stricter or looser
There was a problem hiding this comment.
Yeah. I have actually hit the RESOURCE_LIMITS_EXCEEDED again with this code after posting this. Will investigate this a bit more and see if it is possible to do this in a smarter way.
0737ccb to
8389e82
Compare
8389e82 to
85a3a5c
Compare
Large stacks exceed GitHub's GraphQL complexity budget when all operations are packed into a single request. Split query_everything, create_pull_requests, and update_pull_requests into batches of --github-batch-size PRs (default 5) per request. Also remove RETRYABLE_GRAPHQL_ERRORS since RESOURCE_LIMITS_EXCEEDED is a deterministic complexity rejection, not a transient error.
…failing
Batching by PR count (--github-batch-size) does not bound the quantity that
actually matters: the server-side work a single request triggers. A batch of 5
PRs expands to a variable number of sub-mutations depending on how many are new
(each new PR adds two large addComment operations for the review-graph and
patchsets comments) plus per-PR reviewer and label mutations. A real 10-PR
aircam stack produced a 19-sub-mutation request that GitHub rejected with
RESOURCE_LIMITS_EXCEEDED, while the batch right before it (10 sub-mutations)
succeeded.
We cannot precompute whether a request will fit. There are two distinct limits:
- The documented 500k node limit is static and computable from the first/last
literals in the query, and we are nowhere near it (a 5-PR query is ~500
nodes). This is not what we hit.
- RESOURCE_LIMITS_EXCEEDED is a runtime resource budget. GitHub publishes no
formula and no threshold for it; the docs only describe the behavior ("if a
query consumes too many resources" it is terminated with partial results).
The true cost depends on server-side factors we can't see: notifications and
webhooks fired per addComment, repo size, index state, and current load. So
there is no value of --github-batch-size, and no client-side estimate, that
reliably stays under it.
Since the limit can't be predicted, react to it instead. On
RESOURCE_LIMITS_EXCEEDED, halve the batch and retry until it fits (or a single
PR/ref is reached, which we then surface). This discovers the limit at runtime
and backs off, rather than guessing a constant that drifts with stack shape and
server load.
The error is a *partial* success: GitHub applies some sub-mutations (including
addComments) before running out of budget. Resending the same request as-is
would re-post those comments, so _update_with_splitting first runs
_refresh_new_comment_ids to convert already-posted comments into edits, reusing
the idempotency machinery from e53869f, then splits and retries. The read-only
query path (_query_prs_with_splitting) needs no such guard.
85a3a5c to
97c4e05
Compare
|
btw i've refactored a large amount of code here and reapplied this change here #254 |
So is this PR no longer needed? Should I just close it? |
Summary
RESOURCE_LIMITS_EXCEEDED) would re-postaddCommentmutations that the server had already applied, producing duplicate stack outline and diff table comments on PRs.graphql()method intoupdate_pull_requests()where it has context to handle idempotency. Between retry attempts, re-queries each PR's comments and converts any already-posted new comments toupdateIssueComment(edit) operations.RESOURCE_LIMITS_EXCEEDEDon large stacks. Splitsquery_everything,create_pull_requests, andupdate_pull_requestsinto chunks of--github-batch-sizePRs (default 5) per request.RETRYABLE_GRAPHQL_ERRORSsinceRESOURCE_LIMITS_EXCEEDEDis a deterministic complexity rejection, not a transient error worth retrying.