fix: prevent concurrent processing for automotivelinux repos [CM-1103]#4015
fix: prevent concurrent processing for automotivelinux repos [CM-1103]#4015
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
There was a problem hiding this comment.
Pull request overview
Adds a rate-limit safeguard intended to reduce load on Gerrit (automotivelinux) by avoiding selecting additional repositories from gerrit.automotivelinux.org while one is already being processed.
Changes:
- Introduced an
automotivelinux_processingCTE to detect in-flight processing for automotivelinux repos. - Updated recurrent repository selection to skip automotivelinux repos when one is already processing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| AND NOT ( | ||
| r.url LIKE '%gerrit.automotivelinux.org%' | ||
| AND EXISTS (SELECT 1 FROM automotivelinux_processing) | ||
| ) |
There was a problem hiding this comment.
This guard is not concurrency-safe: two workers can execute this statement concurrently, both evaluate automotivelinux_processing as empty (statement snapshot), and each acquire a different automotivelinux repo row via FOR UPDATE ... SKIP LOCKED, resulting in concurrent processing anyway. To guarantee single-flight per host, use a DB-level mutex (e.g., pg_try_advisory_xact_lock keyed by the host, or a dedicated lock row/table locked FOR UPDATE) acquired atomically during selection.
| """Acquire a regular (non-onboarding) repository, that were not processed in the last x hours (REPOSITORY_UPDATE_INTERVAL_HOURS)""" | ||
| recurrent_repo_sql_query = f""" | ||
| WITH selected_repo AS ( | ||
| -- Rate-limit guard: Gerrit (automotivelinux) aggressively rate-limits concurrent connections. | ||
| -- This CTE checks if any automotivelinux repo is already being processed, | ||
| -- so we skip picking another one from the same host until the current one finishes. |
There was a problem hiding this comment.
The PR description/title suggests preventing concurrent processing for automotivelinux repos in general, but this safeguard only applies to the recurrent acquisition path. Automotivelinux repos can still be acquired concurrently via acquire_onboarding_repo (pending) and acquire_pending_reonboard_repo (pending_reonboard). If the intent is truly “only one at a time”, the same host-level guard should be applied across all acquisition queries (or centralized in acquire_repo_for_processing).
| WHERE rp2.state = 'processing' | ||
| AND rp2."lockedAt" IS NOT NULL | ||
| AND r2.url LIKE '%gerrit.automotivelinux.org%' | ||
| LIMIT 1 |
There was a problem hiding this comment.
Both checks use url LIKE '%gerrit.automotivelinux.org%' (leading wildcard), which prevents using the existing btree/unique index on repositories.url and can force scans as the table grows. If this runs frequently, consider matching in an index-friendly way (e.g., store/extract hostname in a separate column, use an anchored prefix pattern when possible, or add an appropriate trigram index).
| r.url LIKE '%gerrit.automotivelinux.org%' | ||
| AND EXISTS (SELECT 1 FROM automotivelinux_processing) |
There was a problem hiding this comment.
The hostname pattern '%gerrit.automotivelinux.org%' is duplicated in multiple places in this query; that increases the chance of future drift if it ever needs to change. Consider factoring it into a single SQL value (e.g., a CTE constant) or a Python-level constant used to build the query.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit db07fb6. Configure here.
| SELECT 1 | ||
| FROM git."repositoryProcessing" rp2 | ||
| JOIN public.repositories r2 ON r2.id = rp2."repositoryId" | ||
| WHERE rp2.state = 'processing' |
There was a problem hiding this comment.
Hardcoded state string bypasses parameterized query pattern
Low Severity
The automotivelinux_processing CTE hardcodes rp2.state = 'processing' as a string literal, while every other state comparison in this file uses parameterized values. Parameter $1 already contains RepositoryState.PROCESSING (value "processing") and is used in the same query's UPDATE clause. Using a hardcoded string breaks the established pattern and creates a maintenance risk — if the enum value ever changes, this CTE would silently check for the wrong state.
Reviewed by Cursor Bugbot for commit db07fb6. Configure here.
| AND NOT ( | ||
| r.url LIKE '%gerrit.automotivelinux.org%' | ||
| AND EXISTS (SELECT 1 FROM automotivelinux_processing) | ||
| ) |
There was a problem hiding this comment.
Race condition allows concurrent automotivelinux repo acquisition
Medium Severity
The automotivelinux_processing CTE reads only committed data (PostgreSQL READ COMMITTED default), and doesn't acquire any locks. Two concurrent calls to acquire_recurrent_repo can both evaluate the CTE simultaneously, both find no automotivelinux repo in processing state, and both proceed to select and lock different automotivelinux repos via SKIP LOCKED. This defeats the stated goal of ensuring only one automotivelinux repo is processed at a time.
Reviewed by Cursor Bugbot for commit db07fb6. Configure here.
| WHERE rp2.state = 'processing' | ||
| AND rp2."lockedAt" IS NOT NULL | ||
| AND r2.url LIKE '%gerrit.automotivelinux.org%' | ||
| LIMIT 1 |
There was a problem hiding this comment.
Stuck automotivelinux repo permanently blocks all others
High Severity
The automotivelinux_processing CTE has no staleness/timeout check on lockedAt. If a worker crashes (OOM, pod eviction) while processing an automotivelinux repo, that repo remains in state='processing' with lockedAt IS NOT NULL indefinitely. The CTE will then permanently match this stuck row, blocking all other automotivelinux repos from ever being acquired. The stuck repo itself is also excluded by states_to_exclude, so no worker can pick it up to resolve it — the block is permanent until manual DB intervention.
Reviewed by Cursor Bugbot for commit db07fb6. Configure here.


This pull request introduces a safeguard to prevent overloading Gerrit (automotivelinux) repositories by ensuring that only one such repository is processed at a time. This is achieved by modifying the SQL query in the
acquire_recurrent_repofunction to check if any automotivelinux repository is already being processed, and if so, skip selecting another from the same host.Repository processing rate-limiting:
acquire_recurrent_repo(incrud.py) to include a Common Table Expression (CTE) that checks if any automotivelinux repository is currently being processed, and prevents acquiring another one from the same host until the current one finishes. This helps avoid triggering Gerrit's aggressive rate-limiting.Note
Medium Risk
Changes the repository-acquisition SQL used for scheduling work; mistakes could stall or starve processing for affected repos, though scope is limited to
gerrit.automotivelinux.orgrate-limiting.Overview
Adds a rate-limiting safeguard in
acquire_recurrent_repoby introducing a CTE that detects when anygerrit.automotivelinux.orgrepository is already inprocessing, and excludes selecting additional repos from that host until the current one finishes.This reduces concurrent connections to the automotivelinux Gerrit host while leaving selection behavior unchanged for other repositories.
Reviewed by Cursor Bugbot for commit ebd55e0. Bugbot is set up for automated code reviews on this repo. Configure here.