Skip to content

Adding an option to fit BIDS-study layout (updated)#369

Open
djarecka wants to merge 6 commits into
PennLINC:mainfrom
djarecka:structurechange_1
Open

Adding an option to fit BIDS-study layout (updated)#369
djarecka wants to merge 6 commits into
PennLINC:mainfrom
djarecka:structurechange_1

Conversation

@djarecka

Copy link
Copy Markdown
Collaborator

instead of #361 (after rewriting history went wrong); I included the newest changes to main.
From the previous description:

  • allowing for configuring analysis_path
  • ading rias to .babs and gitignore
  • adding initial configuration file to .babs, so it can be read whenever is neeeded

An example presenting the usage and the layout can be found on the
babs_demo repo, follow the section for the bids_layout

@codecov-commenter

codecov-commenter commented Apr 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.12%. Comparing base (95fd99d) to head (42a8cea).

Files with missing lines Patch % Lines
babs/bootstrap.py 71.42% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #369      +/-   ##
==========================================
+ Coverage   79.07%   79.12%   +0.04%     
==========================================
  Files          17       17              
  Lines        2079     2103      +24     
  Branches      362      366       +4     
==========================================
+ Hits         1644     1664      +20     
- Misses        300      302       +2     
- Partials      135      137       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tien-tong tien-tong self-requested a review April 23, 2026 13:04
Comment thread babs/container.py
processing_level,
system,
project_root=None,
analysis_dir='analysis',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new arg but docstring wasn't updated

@tien-tong tien-tong linked an issue Apr 24, 2026 that may be closed by this pull request
@tien-tong tien-tong self-requested a review May 21, 2026 15:59

@asmacdo asmacdo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this against the https://github.com/djarecka/babs_demo with the bids layout setup, and it worked very smoothly!

Tested this end-to-end on a Dartmouth/Discovery slurm cluster using djarecka's babs_demo bids_layout walkthrough (simbids → init with analysis_path: . and .babs/{input,output}_ria → submit → merge → unzip). All jobs finished and the resulting layout under <study>/derivatives/<project>/ is exactly the BIDS-derivatives shape we (mechababs) want for the OpenNeuroStudies/OpenNeuroDerivatives integration. Strongly aligned with our roadmap — thanks!

Three small things noticed while verifying:

1. .gitattributes defaults annex small metadata files. .babs/babs_init_config.yaml (newly committed by this PR) and the extracted dataset_description.json both end up as annex symlinks. Filed as #378
2. .gitignore RIA rule uses op.basename, which is over-broad. With input_ria_path: .babs/input_ria, the basename input_ria becomes a gitignore rule that matches any path component named input_ria. I think this is just a nitpick and should be fine.

@tien-tong tien-tong added the enhancement New feature or request label May 22, 2026
@djarecka

Copy link
Copy Markdown
Collaborator Author

Hi @tien-tong, just pinging if you have any questions or comments to this PR

@tien-tong tien-tong left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the orig BABS project layout with real data and everything looks good!

I've also tested a BIDS-study layout config yaml file for freesurfer-post (my commit). This section from the BABSProject/README.md need to be changed to sourcedata for BIDS layout

## Dataset structure

- All inputs (i.e. building blocks from other sources) are located in
  `inputs/`.

Other than that, everything looks good to me!

Comment thread babs/base.py

analysis_dir = cfg.get('analysis_path', 'analysis')
self.analysis_path = op.normpath(op.join(self.project_root, analysis_dir))
self._analysis_datalad_handle = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analysis_path, input_ria_path, and output_ria_path are joined with project_root without validation. Path values are not validated, so absolute paths, .., "", and "." can escape or collapse into unintended locations

if a config uses analysis_path: ../outside, BABS may create/write outside the project root, while cleanup only deletes project_root.

Suggested fix: centralize path resolution with Path.resolve(), reject empty/non-string paths, reject absolute paths unless explicitly supported, and require resolved paths to stay inside project_root.

Comment thread babs/bootstrap.py
system,
project_root=op.dirname(self.analysis_path),
analysis_dir=op.basename(self.analysis_path),
shared_group_mode=shared_group_mode,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested analysis_path are reconstructed from dirname + basename

but PROJECT_ROOT in the generated environment still means the real project root.

CONTAINER_SHARED="${PROJECT_ROOT}/{{ analysis_dir }}/${CONTAINER_JOB}"

This is fragile and confusing, and breaks easily for external callers or future script changes.

Suggested fix: pass a single analysis_path or analysis_relpath into the template. Rename shell variables to what they mean, e.g. BABS_PROJECT_ROOT and BABS_ANALYSIS_PATH, and stop reconstructing nested paths from basename.

Comment thread babs/bootstrap.py
# not to track input/output RIA stores:
gitignore_file.write('\n' + op.basename(self.input_ria_path))
gitignore_file.write('\n' + op.basename(self.output_ria_path))
# not to track `logs` folder:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.gitignore only ignores RIA basenames.

Suggested fix: when RIA paths are inside analysis_path, write paths relative to analysis_path.

Comment thread babs/cli.py

babs_proj = BABSBootstrap(project_root)
babs_proj = BABSBootstrap(project_root, container_config=container_config)
try:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct API usage like silently ignores custom analysis_path.

class BABSBootstrap(BABS):
    def __init__(self, project_root, container_config=None):
        super().__init__(project_root, container_config=container_config)

Suggested fix: either require container_config in BABSBootstrap.__init__ or resolve init paths inside babs_bootstrap() before any path-dependent work.

Comment thread babs/base.py
if op.exists(root_config_path):
with open(root_config_path) as f:
cfg = yaml.safe_load(f) or {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty YAML returns None, causing an opaque AttributeError.

Suggested fix: validate with something like:

cfg = yaml.safe_load(f) or {}
if not isinstance(cfg, dict):
    raise ValueError('container_config must be a YAML mapping')


cmd_template: '{{ submit_head }} {{ env_flags }} {{ name_flag_str }}{{ job_name }} {{ eo_args }} {{ array_args }} {% if test %}{{ babs.analysis_path }}/code/check_setup/call_test_job.sh{% else %}{{ babs.analysis_path }}/code/participant_job.sh {{ dssource }} {{ pushgitremote }} {{ babs.job_submit_path_abs }} {{ babs.project_root }}{% endif %}'
cmd_template: '{{ submit_head }} {{ env_flags }} {{ name_flag_str }}{{ job_name }} {{ eo_args }} {{ array_args }} {% if test %}{{ babs.analysis_path }}/code/check_setup/call_test_job.sh{% else %}{{ babs.analysis_path }}/code/participant_job.sh {{ dssource }} {{ pushgitremote }} {{ babs.job_submit_path_abs }} {{ babs.analysis_root }}{% endif %}'
job_name_template: '{{ job_name }}' No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated command strings are split with plain .split().

job_id = sbatch_get_job_id(cmd.split(), analysis_path)

spaces in project paths, RIA paths, or analysis paths will break job submission.

Suggested fix: build argv as structured data, or render shell-quoted strings and parse with shlex.split().

Comment thread babs/bootstrap.py
with open(self.config_path, 'w') as f:
f.write(
template.render(
analysis_dir=op.basename(self.analysis_path),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analysis_dir is passed but unused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Are RIAs necessary?

4 participants