Feature/rocoto#168
Draft
DavidBurrows-NCO wants to merge 20 commits into
Draft
Conversation
Batch optional
|
Link to ReadTheDocs sample build for this PR can be found at: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
If merged, this PR will enable users to run the nested-EAGLE quickstart pipeline with Rocoto using UWTools for YAML to XML conversion.
This PR also enables an option to run pipeline tasks with or without batch submission. As suggested by @maddenp-cu, this enables Rocoto to directly submit each pipeline task to slurm without an intermediary step.
Resolves #99
A user would follow the quickstart guide’s step 1:
make env cudascript=ursa. Then update step 2 tomake config compose=base:ursa:quickstart > eagle.yaml. Updateapp.base. Finally, runmake workflow config=eagle.yamlwhich will converteagle.yamlinto aneagle.xmland iterate through quickstart guide steps 4-8.Current issues:
test $? -eq 0 && touch runscript.zarr-gfs.donedoes not report job failures back to Rocoto. If test=0 (success) then a .done file is generated. The script will return successful status back to Rocoto. However, if test>0 (failure) then nothing happens. The script just ends, and Rocoto believes the job is successful and will continue through the workflow. We need to add something like
eagle-tools inference inference.yaml && touch runscript.inference.done || { echo “job failed”; exit 1; }which should communicate a failure to Rocoto.
run/default/vx/prewxvx/global/prewxvx.log):OSError: [Errno -101] NetCDF: HDF error: '/scratch4/NAGAPE/epic/David.Burrows/may20/EAGLErocoto/src/run/default/data/global_one_degree_with_mask.nc'and
PermissionError: [Errno 13] Permission denied: '/scratch4/NCEPDEV/nems/David.Burrows/eagle/may4_full_workflow/EAGLE/src/run/default/data/global_one_degree_with_mask.nc'I believe the errors stem from the global prewxvx jobs attempting to read
/scratch4/NCEPDEV/nems/David.Burrows/eagle/may4_full_workflow/EAGLE/src/run/default/data/global_one_degree.ncsimultaneously. I see a couple ways to circumvent this. 1) Run vx jobs in serial or nested-metatasks in mixed parallel/serial mode which increases pipeline runtime 2) split vx jobs into grid2grid and grid2obs metatasks which leaves a gap between launching the 2 global jobs and increases pipeline runtime, 3) other thoughts?Tests ran so far:
Follow quickstart guide directly
Follow guide described for Rocoto above
Type of change:
Area(s) affected
Commit Requirements: