Adding an option to fit BIDS-study layout (updated)#369
Conversation
…rectory for config/RIA storage
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #369 +/- ##
==========================================
+ Coverage 79.07% 79.12% +0.04%
==========================================
Files 17 17
Lines 2079 2103 +24
Branches 362 366 +4
==========================================
+ Hits 1644 1664 +20
- Misses 300 302 +2
- Partials 135 137 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| processing_level, | ||
| system, | ||
| project_root=None, | ||
| analysis_dir='analysis', |
There was a problem hiding this comment.
new arg but docstring wasn't updated
asmacdo
left a comment
There was a problem hiding this comment.
I ran this against the https://github.com/djarecka/babs_demo with the bids layout setup, and it worked very smoothly!
Tested this end-to-end on a Dartmouth/Discovery slurm cluster using djarecka's babs_demo bids_layout walkthrough (simbids → init with analysis_path: . and .babs/{input,output}_ria → submit → merge → unzip). All jobs finished and the resulting layout under <study>/derivatives/<project>/ is exactly the BIDS-derivatives shape we (mechababs) want for the OpenNeuroStudies/OpenNeuroDerivatives integration. Strongly aligned with our roadmap — thanks!
Three small things noticed while verifying:
1. .gitattributes defaults annex small metadata files. .babs/babs_init_config.yaml (newly committed by this PR) and the extracted dataset_description.json both end up as annex symlinks. Filed as #378
2. .gitignore RIA rule uses op.basename, which is over-broad. With input_ria_path: .babs/input_ria, the basename input_ria becomes a gitignore rule that matches any path component named input_ria. I think this is just a nitpick and should be fine.
|
Hi @tien-tong, just pinging if you have any questions or comments to this PR |
tien-tong
left a comment
There was a problem hiding this comment.
I've tested the orig BABS project layout with real data and everything looks good!
I've also tested a BIDS-study layout config yaml file for freesurfer-post (my commit). This section from the BABSProject/README.md need to be changed to sourcedata for BIDS layout
## Dataset structure
- All inputs (i.e. building blocks from other sources) are located in
`inputs/`.
Other than that, everything looks good to me!
|
|
||
| analysis_dir = cfg.get('analysis_path', 'analysis') | ||
| self.analysis_path = op.normpath(op.join(self.project_root, analysis_dir)) | ||
| self._analysis_datalad_handle = None |
There was a problem hiding this comment.
analysis_path, input_ria_path, and output_ria_path are joined with project_root without validation. Path values are not validated, so absolute paths, .., "", and "." can escape or collapse into unintended locations
if a config uses analysis_path: ../outside, BABS may create/write outside the project root, while cleanup only deletes project_root.
Suggested fix: centralize path resolution with Path.resolve(), reject empty/non-string paths, reject absolute paths unless explicitly supported, and require resolved paths to stay inside project_root.
| system, | ||
| project_root=op.dirname(self.analysis_path), | ||
| analysis_dir=op.basename(self.analysis_path), | ||
| shared_group_mode=shared_group_mode, |
There was a problem hiding this comment.
nested analysis_path are reconstructed from dirname + basename
but PROJECT_ROOT in the generated environment still means the real project root.
CONTAINER_SHARED="${PROJECT_ROOT}/{{ analysis_dir }}/${CONTAINER_JOB}"
This is fragile and confusing, and breaks easily for external callers or future script changes.
Suggested fix: pass a single analysis_path or analysis_relpath into the template. Rename shell variables to what they mean, e.g. BABS_PROJECT_ROOT and BABS_ANALYSIS_PATH, and stop reconstructing nested paths from basename.
| # not to track input/output RIA stores: | ||
| gitignore_file.write('\n' + op.basename(self.input_ria_path)) | ||
| gitignore_file.write('\n' + op.basename(self.output_ria_path)) | ||
| # not to track `logs` folder: |
There was a problem hiding this comment.
.gitignore only ignores RIA basenames.
Suggested fix: when RIA paths are inside analysis_path, write paths relative to analysis_path.
|
|
||
| babs_proj = BABSBootstrap(project_root) | ||
| babs_proj = BABSBootstrap(project_root, container_config=container_config) | ||
| try: |
There was a problem hiding this comment.
Direct API usage like silently ignores custom analysis_path.
class BABSBootstrap(BABS):
def __init__(self, project_root, container_config=None):
super().__init__(project_root, container_config=container_config)
Suggested fix: either require container_config in BABSBootstrap.__init__ or resolve init paths inside babs_bootstrap() before any path-dependent work.
| if op.exists(root_config_path): | ||
| with open(root_config_path) as f: | ||
| cfg = yaml.safe_load(f) or {} | ||
|
|
There was a problem hiding this comment.
empty YAML returns None, causing an opaque AttributeError.
Suggested fix: validate with something like:
cfg = yaml.safe_load(f) or {}
if not isinstance(cfg, dict):
raise ValueError('container_config must be a YAML mapping')
|
|
||
| cmd_template: '{{ submit_head }} {{ env_flags }} {{ name_flag_str }}{{ job_name }} {{ eo_args }} {{ array_args }} {% if test %}{{ babs.analysis_path }}/code/check_setup/call_test_job.sh{% else %}{{ babs.analysis_path }}/code/participant_job.sh {{ dssource }} {{ pushgitremote }} {{ babs.job_submit_path_abs }} {{ babs.project_root }}{% endif %}' | ||
| cmd_template: '{{ submit_head }} {{ env_flags }} {{ name_flag_str }}{{ job_name }} {{ eo_args }} {{ array_args }} {% if test %}{{ babs.analysis_path }}/code/check_setup/call_test_job.sh{% else %}{{ babs.analysis_path }}/code/participant_job.sh {{ dssource }} {{ pushgitremote }} {{ babs.job_submit_path_abs }} {{ babs.analysis_root }}{% endif %}' | ||
| job_name_template: '{{ job_name }}' No newline at end of file |
There was a problem hiding this comment.
generated command strings are split with plain .split().
job_id = sbatch_get_job_id(cmd.split(), analysis_path)
spaces in project paths, RIA paths, or analysis paths will break job submission.
Suggested fix: build argv as structured data, or render shell-quoted strings and parse with shlex.split().
| with open(self.config_path, 'w') as f: | ||
| f.write( | ||
| template.render( | ||
| analysis_dir=op.basename(self.analysis_path), |
There was a problem hiding this comment.
analysis_dir is passed but unused
instead of #361 (after rewriting history went wrong); I included the newest changes to main.
From the previous description:
analysis_pathriasto.babsandgitignore.babs, so it can be read whenever is neeededAn example presenting the usage and the layout can be found on the
babs_demo repo, follow the section for the bids_layout