Added daily requeuing of missing frames.#461
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Celery beat (cron) job to detect and requeue “missing” frames by querying the archive API, and refactors archive querying / queue routing to reduce duplication.
Changes:
- Introduces a new
celery.requeue_missing_framesperiodic task and wires it into the existing cron container/entrypoint. - Adds a shared
banzai.querymodule with retrying archive GET + archive frame pagination helpers. - Refactors queue selection into
get_processing_queue()and reuses archive query helper in FITS downloading / BPM ingestion.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps lco-banzai to 1.36.0 and updates lock metadata. |
| pyproject.toml | Updates version and switches console script to new cron entrypoint. |
| helm-chart/banzai/templates/listener.yaml | Updates the cron container command to banzai_cron. |
| CHANGES.md | Adds 1.36.0 changelog entry for daily requeueing. |
| banzai/utils/observation_utils.py | Adds tenacity retry to calibration-block archive query. |
| banzai/utils/instrument_utils.py | Adds shared get_processing_queue() helper. |
| banzai/utils/fits_utils.py | Replaces direct requests.get with archive_get helper for downloads. |
| banzai/settings.py | Adds requeue cron configuration knobs. |
| banzai/scheduling.py | Adds the requeue_missing_frames Celery task and archive querying usage. |
| banzai/query.py | New module for archive querying helpers and cross-matching. |
| banzai/main.py | Adds new cron entrypoint scheduling and reuses archive_get / get_processing_queue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jchate6
left a comment
There was a problem hiding this comment.
I would like to see more documentation here if possible.
-
New or heavily reworked functions would benefit from doc strings describing their purpose, use, and intended functionality.
-
I think we need to record this change of behavior somewhere easily accessible and referenced so that the precise details of Banzai's expected behavior are understood by a wider audience.
-
There are several changes here that I don't really understand why they were made (like the changes in
fits_utils). Some comments in the PR explaining how these different changes are related would generally be appreciated.
Also, there are no test changes. Do we want to test that we catch and re-queue missing frames correctly?
I'm curious if we have a different strategy for the cron. Does it make more sense to run it more than once a day? What if we triggered for each site at local noon rather than one bulk re-queue in the West Coast morning? This would spread out the queries and queuing as well.
|
|
||
| REQUEUE_OBSTYPES = ['EXPOSE', 'STANDARD'] | ||
|
|
||
| REQUEUE_LOOKBACK_HOURS = 36 |
There was a problem hiding this comment.
This feels like an odd lookback time.
This means that everything from 02:30-14:30 is checked twice while everything from 14:30 to 02:30 is only checked once.
It isn't clear to me why we would have redundancy for OGG and ELP, but not CPT and TFN...
It feels like this should be either 25 or 26 hours if we just want a little overlap to make sure we didn't miss anything during processing, or a full 48 hours if we want to check everything twice.
|
|
||
| REFERENCE_CATALOG_URL = os.getenv('REFERENCE_CATALOG_URL', 'http://phot-catalog.lco.gtn/') | ||
|
|
||
| REQUEUE_MISSING_FRAMES_TIME = datetime.time(hour=14, minute=30) |
There was a problem hiding this comment.
Are we worried about this being too close to the 15:00 report timing? If there are more than a few files, is it possible these might be missed?
|
|
||
| app.add_periodic_task(crontab(hour=runtime_context.REQUEUE_MISSING_FRAMES_TIME.hour, | ||
| minute=runtime_context.REQUEUE_MISSING_FRAMES_TIME.minute), | ||
| requeue_missing_frames.s(runtime_context=vars(runtime_context)), |
There was a problem hiding this comment.
I'm not sure what the .s is doing... is that a celery thing?
| for obstype in runtime_context.REQUEUE_OBSTYPES: | ||
| # Get the raw frames that we took | ||
| start = datetime.now(timezone.utc) - timedelta(hours=runtime_context.REQUEUE_LOOKBACK_HOURS) | ||
| end = datetime.now(timezone.utc) |
There was a problem hiding this comment.
Is there any concern here about re-queuing things that are just in the middle of being processed?
| return frames | ||
|
|
||
|
|
||
| def cross_match_missing_frames(raw_frames, reduced_frames): |
There was a problem hiding this comment.
This will also consider any unbinned UBVRI frames as missing and re-queue them, correct?
There was a problem hiding this comment.
If they are missing then yes, but it will not automatically requeue frames that already processed even if they are unbinned.
There was a problem hiding this comment.
We don't process those at all unless specifically requested.
There are many warnings and it hasn't happened in months, but they will always be unprocessed.
This PR adds a cron to requeue missing frames using the same cron container as the calibration scheduler. The query only relies on the archive api.
I also refactored some of the archive querying for better reuse of the retry logic and less code duplication.