Skip to content

Issue 126: IRI-API#127

Draft
davramov wants to merge 42 commits into
als-computing:mainfrom
davramov:iri_api
Draft

Issue 126: IRI-API#127
davramov wants to merge 42 commits into
als-computing:mainfrom
davramov:iri_api

Conversation

@davramov
Copy link
Copy Markdown
Contributor

This PR addresses issue #126 with the integration of IRI-API for NERSC job submissions. It builds on the NERSCTomographyHPCController by providing a login_method option to choose between SFAPI (current) and IRI-API (new) authentication. It also adds two new methods: _submit_job() and _wait_for_job that use the mechanisms specific to each API.

davramov added 24 commits April 23, 2026 11:09
…t flow for reconstruction to test job submission. In reconstruct(), replaced the SFAPI-specific job submission/polling code with the general _submit_job() and _wait_for_job() methods.
@davramov davramov requested a review from dylanmcreynolds May 20, 2026 16:48
Copy link
Copy Markdown
Contributor

@dylanmcreynolds dylanmcreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe re-arrange the tests in this PR? nersc and mlflow are general capabilities warranting their own tests outside of test_bl832. But some of what's in test_nersc.py now is specific to 8.3.2

Copy link
Copy Markdown
Contributor

@dylanmcreynolds dylanmcreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nersc.py reawlly should be up one level.

Comment thread orchestration/flows/bl832/job_controller.py
Comment thread orchestration/flows/bl832/nersc.py Outdated
Comment thread orchestration/flows/bl832/register_mlflow.py Outdated
Copy link
Copy Markdown
Contributor

@dylanmcreynolds dylanmcreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot of hardcoding in register_mlflow.py that should eventually be parameters...maybe not this PR

@davramov
Copy link
Copy Markdown
Contributor Author

the nersc.py really should be up one level.

There are definitely parts of bl832/nersc.py and bl832/job_controller.py that can be generalized for more beamlines, and other parts that are specific to 832. I think it would be great to separate the layers a bit and make the work we've done more reusable.

A potential path forward for reorganizing:

orchestration/
├── hpc/
│   ├── __init__.py
│   ├── controller.py                   # HPCController ABC (minimal; just self.config)
│   ├── factory.py                      # HPC enum
│   ├── nersc/                          # generic NERSC plumbing
│   │   ├── __init__.py
│   │   ├── login.py                    # NERSCLoginMethod + create_nersc_client + _create_*_client
│   │   ├── controller.py               # NERSCHPCController(HPCController)
│   │   └── shifter.py                  # pull/check_shifter_image
│   └── alcf/                           # generic ALCF plumbing
│       ├── __init__.py
│       └── controller.py               # ALCFHPCController(HPCController) — Globus Compute submit/wait
├── job_options.py                      # Dynamic job submission parameters from config -> MLflow -> Prefect variable layers
└── flows/
    └── bl832/
        ├── job_controller.py           # BL832-specific factory
        ├── nersc.py                    # NERSCTomographyHPCController(NERSCHPCController) + BL832 NERSC flows
        └── alcf.py                     # ALCFTomographyHPCController(ALCFHPCController) + BL832 ALCF flow

@dylanmcreynolds, curious about your thoughts on this structure before I move forward. Maybe it makes sense to open a new Issue/PR to tackle it.

@dylanmcreynolds
Copy link
Copy Markdown
Contributor

I generally love the structure. A couple of small notes:

  • maybe instead of the namespace hpc, what about jobs? This could apply to our local cluster some day
  • job_options.py could be `jobs/options.py'
  • controller.py and factory.py feel like they could be in one module, just controller.py?

@davramov
Copy link
Copy Markdown
Contributor Author

I like it, here's a revised version:

orchestration/
├── jobs/                               # Generic job-submission abstractions, transport-agnostic
│   ├── __init__.py
│   ├── controller.py                   # JobController ABC + MachineType enum
│   ├── options.py                      # _load_job_options (3-layer resolution)
│   ├── nersc/                          # Generic NERSC plumbing — no beamline knowledge
│   │   ├── __init__.py
│   │   ├── login.py                    # NERSCLoginMethod, create_nersc_client, _create_*api_client
│   │   ├── controller.py               # NERSCJobController(JobController), submit_job(), wait_for_job(), ...
│   │   └── shifter.py                  # pull_shifter_image, check_shifter_image
│   └── alcf/                           # Generic ALCF plumbing — no beamline knowledge
│       ├── __init__.py
│       └── controller.py               # ALCFHPCController(HPCController), submit_globus_job(), wait_for_globus_future(), ...
└── flows/
    └── bl832/
        ├── job_controller.py           # BL832-specific factory: get_controller(machine_type, config, login_method)
        ├── nersc.py                    # NERSCTomographyJobController + BL832 NERSC flows
        └── alcf.py                     # ALCFTomographyJobController + BL832 ALCF flow

@dylanmcreynolds
Copy link
Copy Markdown
Contributor

Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants