Skip to content

Conversation

@bpinsard
Copy link
Collaborator

Changes proposed in this pull request

Currently when preprocessing fMRI only dataset using precomputed sMRIPrep+freesurfer (--fs-no-resume) and passing --session-label yy, the fs_subject_id ends up being sub-xx_ses-yy using the fMRI selected session, and thus it picks files from that folder. I noticed it because my jobs was only pre-fetching with datalad freesurfer/sub-xx folder from a precomputed longitudinal freesurfer, but the pipeline tried using freesurfer files from sub-xx_ses-yy instead, which were broken symlinks.

Not sure what is the best approach, but it seems that it should only try to get session-specific freesurfer when using --subject-anatomical-reference sessionwise, but it might require another case for first-lex.

I am not 100% sure what are the outputs when you using this option:

  • session-wise -> freesurfer/sub-xx_ses-yy/ ?
  • first-lex -> freesurfer/sub-xx or same as sessionwise
  • unbiased -> freesurfer/sub-xx I assume

Documentation that should be reviewed

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.85%. Comparing base (55087d5) to head (6952f35).

Files with missing lines Patch % Lines
fmriprep/workflows/base.py 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3588      +/-   ##
==========================================
- Coverage   72.88%   72.85%   -0.04%     
==========================================
  Files          60       60              
  Lines        4821     4823       +2     
  Branches      626      627       +1     
==========================================
  Hits         3514     3514              
- Misses       1164     1165       +1     
- Partials      143      144       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsalo
Copy link
Collaborator

tsalo commented Dec 16, 2025

Given that, even if you use first-lex or unbiased, you can still select specific sessions with --session-label, I think the Freesurfer subject ID needs to include the session label regardless. When running the newer versions of fMRIPrep with older Freesurfer derivatives, I have created symlinks from sub-<label> to sub-<label>_ses-<label> to make it work.

@bpinsard
Copy link
Collaborator Author

Do you mean that this change would break the pipeline when not using precomputed freesurfer?

If I understand correctly the code, if using unbiased or first-lex BIDSSourceFile will generate a source_file that removes the ses- entity if multiple T1w (T2w) are used (even if all from the same session), so the sMRIPrep and freesurfer outputs will not contain that entity (even for first-lex !).
So reusing these outputs for both cases for a new fMRI only session would fail without this fix, as it will try to get freesurfer output with that session entity.
So the way I see it is that --session-label is for fMRI only, while --subject-anatomical-reference (and bids_filters) control what is used to build the anatomical reference.

In my case I cannot simply use symlinks, as these session-level folder already exists as it is a true longitudinal freesurfer (including sub-xx_ses-yy.long too) that was run separately (though using sMRIPrep brain extraction as it is more robust).

Covering all these cases with tests would be tricky, but important.

@tsalo
Copy link
Collaborator

tsalo commented Dec 16, 2025

I think the current changes would break things when using --subject-anatomical reference unbiased or first-lex with --session-label. I don't think --session-label is supposed to only filter fMRI sessions, but I could be wrong. I can imagine people wanting to run fMRIPrep on a subset of sessions, using the anatomical data from just those sessions.

(about, ds_report_about, [('out_report', 'in_file')]),
]) # fmt:skip

if config.workflow.subject_anatomical_reference == 'sessionwise':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this?

Suggested change
if config.workflow.subject_anatomical_reference == 'sessionwise':
if (config.workflow.subject_anatomical_reference == 'sessionwise') or (config.workflow.session_label is not None):

@bpinsard
Copy link
Collaborator Author

I think the current changes would break things when using --subject-anatomical reference unbiased or first-lex with --session-label. I don't think --session-label is supposed to only filter fMRI sessions, but I could be wrong. I can imagine people wanting to run fMRIPrep on a subset of sessions, using the anatomical data from just those sessions.

I think it would not break that usecase.

Trying to branch all the possibility, this is complex.

  • if no precomputed sMRIPrep+freesurfer (haven't tested it):
  • if precomputed + sessionwise
  • if precomputed given with a freesurfer subjects-dir/sub-xx (output from first-lex or unbiased above or custom):
    • "unbiased" or "first-lex":
      • with or without --session-label
      • if no anat in BIDS and a single bold file _create_multi_source_file keeps the -ses entity in the filename, if multiple bold runs drops it (including for multi-echo with the latest fix) generating an invalid bids path (ses- folder, no ses- in file), so as @mgxd mentioned this function is problematic. Either way niworkflows.interfaces.bids.BIDSInfo extract the session entity from the path, from the first BOLD (or anat) file that is then used to generated the fs_subject_id, which points to non-existent folder. Unless I am missing something that logic is flawed.
> niworkflows.interfaces.bids.BIDSInfo(in_file='/path/to/bids/sub-01/ses-001/func/sub-01_bold.nii.gz',bids_validate=True).run().outputs

acquisition = <undefined>
reconstruction = <undefined>
run = <undefined>
session = 001
subject = 01
suffix = bold
task = <undefined>

(BTW: bids_validate is not used anywhere in that interface)

Anyway, sorry I am getting lost in the multiverse, just to say that the logic from command-line options -> select files -> select single representative file / generic source (with the broken _create_multi_source_file ) -> extract BIDS entities -> generate fs_subject_id is very convoluted, brittle and might result in questionable output naming that are not reusable for "apply".

AFAIK the freesurfer outputs naming is not covered by tests in either fMRIPrep or sMRIPrep for circleCI or pytests, but I know running full freesurfer in tests would be a pain, unless we use --fs-no-resume and check that the workflow finds what it needs.

@tsalo
Copy link
Collaborator

tsalo commented Dec 19, 2025

The scenario I was imagining was a longitudinal dataset (let's say 4 sessions), and running fMRIPrep using either --subject-anatomical reference unbiased or first-lex: once with --session-label 01 02 and once with --session-label 03 04. In that scenario, I think using just the subject ID as the Freesurfer subject ID would mean that the --session-label 03 04 run would use the Freesurfer derivatives from the --session-label 01 02 run.

Either way niworkflows.interfaces.bids.BIDSInfo extract the session entity from the path, from the first BOLD (or anat) file that is then used to generated the fs_subject_id, which points to non-existent folder. Unless I am missing something that logic is flawed.

That is definitely an issue when you update your fMRIPrep version. It shouldn't be a problem if you're rerunning with the same version though, right?

AFAIK the freesurfer outputs naming is not covered by tests in either fMRIPrep or sMRIPrep for circleCI or pytests, but I know running full freesurfer in tests would be a pain, unless we use --fs-no-resume and check that the workflow finds what it needs.

I think that's a great idea! I had a PR that raises an error if --fs-no-resume is called and the Freesurfer subject folder doesn't exist (nipreps/smriprep#504), which should available in fMRIPrep tests since fMRIPrep uses sMRIPrep at main.

@bpinsard
Copy link
Collaborator Author

bpinsard commented Dec 19, 2025

The scenario I was imagining was a longitudinal dataset (let's say 4 sessions), and running fMRIPrep using either --subject-anatomical reference unbiased or first-lex: once with --session-label 01 02 and once with --session-label 03 04. In that scenario, I think using just the subject ID as the Freesurfer subject ID would mean that the --session-label 03 04 run would use the Freesurfer derivatives from the --session-label 01 02 run.

What output do you envision in smriprep and freesurfer for that usecase?
My understanding is that smriprep will not include ses- entity so same outputs for both pipelines.
For freesurfer, currently:

  • first-lex or unbiased
  • single T1w -> freesurfer contains the ses- entity ** with this PR it is removed **
  • multiple T1w (even if all in a single session) no ses- entity
    So the PR makes it more consistent, in particular if some subject have 1 T1w, others have multiple.

Maybe we should have sub-xx_ses-01+02 and sub-xx_ses-03+04 as freesurfer outputs for unbiased and sub-xx_ses-01 and sub-xx_ses-03 for first-lex ?
The main issue is when outputs are reused in an apply pipeline (or a --derivatives pipeline on new fMRI), grabbing these will be trickier with these convoluted naming.
For example, if you then want to run --session-label 05 which does not have a T1w, it will try to grab freesurfer sub-xx if there are multiple T1w or sub-xx_ses-01 if there was only a single T1w in the first session for that subject across all sessions, while grabbing sub-xx makes more sense (and can be symlinked). Or we need a logic to first check for a sub-xx then with the session-label, then glob any.

For the usecase you describe, the simpler would be to have separate outputs derivatives folders, as sMRIPrep outputs will collide, and freesurfer will too with this fix.

Either way niworkflows.interfaces.bids.BIDSInfo extract the session entity from the path, from the first BOLD (or anat) file that is then used to generated the fs_subject_id, which points to non-existent folder. Unless I am missing something that logic is flawed.

That is definitely an issue when you update your fMRIPrep version. It shouldn't be a problem if you're rerunning with the same version though, right?

No, with the same version, for apply and derivatives reuse, as described above.

AFAIK the freesurfer outputs naming is not covered by tests in either fMRIPrep or sMRIPrep for circleCI or pytests, but I know running full freesurfer in tests would be a pain, unless we use --fs-no-resume and check that the workflow finds what it needs.

I think that's a great idea! I had a PR that raises an error if --fs-no-resume is called and the Freesurfer subject folder doesn't exist (nipreps/smriprep#504), which should available in fMRIPrep tests since fMRIPrep uses sMRIPrep at main.

The main issue is that a lot of the tests require workflow runtime, to check that the files it tries to grab makes sense. So for it to be lightweight, we should have a separated workflow that only grab files. Or maybe monkeypatch the heavy compute subworfklows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants