Niflows are an organizational structure that is targeted at making neuroimaging tools and analyses FAIR (findable, accessible, interoperable, and reusable) with strong assurances of compatibility across environments.
Niflows builds on the lessons of the Nipype ecosystem to support user contributed Workflows as packages. These workflows can be written in any language. Niflows does not restrict a package to be written in Python, but provides additional tooling if it is. Niflows integreates a specific structure for data, code, and tests and a comprehensive test suite to allow for better validation of each Workflow, and easier reuse of Workflows in containerized form.
Niflows is intended to provide replicable workflows that quantify their variability across datasets and operating environments (i.e., different operating systems, versions of libraries and software).
niflow-manager, which provides the nfm command-line tool, aims to support
niflow creation, testing, and packaging. It provides the following sub-commands:
nfm init- Create a stub workflow, with templates for desired languagesnfm install- Install niflows from an online registry or sourcenfm build- Package a workflow into containers (including Docker and Singularity)nfm test- Comprehensive testing across ranges of environments and dependency versions, uses TestKraken
When the niflow is initialized with nfm init, a language specific template from the templates is used. At this moment, only Python has a full support.
The user has to fill the template, including the specification file that is required for all workflows - spec.yml. The specification has two main parts:
The build part is used to create an image when nfm build is run.
The main part of the build specification is required_env and it specifies the environment needed to run the workflow.
Since Neurodocker is used to create Dockerfile, we are using the same fields as Neurodocker specification
(with one exception that the base part should contain image and pkg-manager in one dictionary).
The full Neurodocker specification can be found here.
Specification in the required_env is also used as an additional environment during testing with nfm test.
build:
required_env:
base:
image: debian:stretch
pkg-manager: apt
miniconda:
conda_install: [python=3.7, nipype]
fsl:
version: 5.0.10
afni:
version: latest
An optional field entrypoint is used to set an entrypoint for the container
(niflow-{ORGANIZATION}-{WORKFLOW} is the default value). This allows each
Workflow to be used from the shell without any additional programming.
The test part is used to test the workflow when nfm test is run and it follows the TestKraken specification.
Testing can be performed in several computational environments. These environments
can be described in env or fixed_env (one or both elements have to be specified in the specification).
As in the build part, the environment specification uses components from the Neurodocker specification.
There is one difference, that base part should contain image and pkg-manager in one dictionary.
Both env and fixed_env are used to specify multiple environments. In the env part, each Neurodocker key (e.g. base, miniconda, fsl) can be a list, and TestKraken will create all desired combinations of environment specifications. The fixed_env can provide an additional specification for an environment or a list of complete specifications. The Neurodocker keys must be the same for env and all elements of the fixed_env part.
This is an example of the environment specification that makes use of env and fixed_env elements:
# List all desired combinations of environment specifications. This
# configuration, for example, will produce four different Docker images:
# 1. ubuntu 16.04 + python=3.5, numpy
# 2. ubuntu 16.04 + python=2.7, numpy
# 3. debian:stretch + python=3.5, numpy
# 4. debian:stretch + python=2.7, numpy
env:
base:
- {image: ubuntu:16.04, pkg-manager: apt}
- {image: debian:stretch, pkg-manager: apt}
miniconda:
- {conda_install: [python=3.5, numpy]}
- {conda_install: [python=2.7, numpy]}
# One or more fixed environments to test. These environments are built as defined
# and are not combined in any way. This configuration, for example, will
# produce one Docker image.
fixed_env:
base: {image: debian:stretch, pkg-manager: apt}
miniconda: {conda_install: [python=3.7, numpy]}Example that uses the concept can be found here
In order to eliminate repetition in the env part, for each Neurodocker key the additional structure can be added to describe common and varied parts. The previous example could also look like this:
env:
base:
- {image: ubuntu:16.04, pkg-manager: apt}
- {image: debian:stretch, pkg-manager: apt}
miniconda:
common: {pip_install: [numpy]}
varied:
- {conda_install: [python=3.5]}
- {conda_install: [python=2.7]}Example that uses the concept can be found here
There is a default location where TestKraken tries to find all the data files and all the scripts files - this is the root directory of the tested workflow. However, these default locations can be changed.
In order to specify how to get the data, the data entry has to have two keys - type and location. For now, only one type is implemented - workflow_path, but in the future this might be used to specify external repositories. For type=workflow_path, the location is simply the relative directory path to the workflow path. An example can look like this:
data:
type: workflow_path
location: my_dataThe scripts entry requires only the relative directory path to the workflow path. An example can look like this:
scripts: my_scriptsExample that uses the concept can be found here
The analysis element contains all the information required to run the workflow with the analysis. There is one required element - command, and two optional elements - script and inputs. These are assembled as command script input1 input2 .... When the command is a shell or interpreter (e.g., "bash", "python"), then the script is needed. However, the command can be an executable (e.g., "ssh", "bc") and then the script option is not required. The inputs part contains all the inputs needed to complete the command required to run the analysis. Each element of the inputs entry should have type, argstr (if a flag is needed) and value, and might have additional metadata that can be used by pydra (a dataflow engine used by TestKraken). If type is File, the file is assumed to be relative to the the data directory location. If script is provided, the script file is expected to be in the scripts directory. An example can look like this:
# The analysis part: inputs to the analysis script,
# the command to run the script and the script with the analysis.
analysis:
inputs:
- {type: File, argstr: -f, value: list2sort.json}
command: python
script: sorting.pyThe tests part contains all information regarding testing the analysis output. It is assumed that the output file is compared to the reference file that is available in the data directory (with the same name). If the tests part is not present or it's empty, no tests will be run after the analysis. There could be multiple entries for tests, but each element has to contain file with the name of the output file, name with the user defined name of the test, and script with the name of the script that should be used for running the test. The script can be saved in the script directory (checked first) or it can be an existing test from the TestKraken testing_functions directory. Any user provided tests have to follow the same template as the tests from TestKraken and define a command line interface.
Example:
# Tests to compare the output of the script to reference data.
# The scripts are available under the user defined `script` subdirectory
# or the `testkraken/testing_functions` directory.
tests:
- {file: list_sorted.json, name: regr1, script: test_obj_eq.py}
- {file: list_sorted.json, name: regr1a, script: my_test_obj_eq.py}
- {file: avg_list.json, name: regr2, script: test_obj_eq.py}Example that uses the concept can be found here