Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /metagenomic-taxonomic-community-profiling.ga
testParameterFiles:
- /metagenomic-taxonomic-community-profiling-tests.yml
authors:
- name: "B\xE9r\xE9nice Batut"
orcid: 0000-0001-9852-1987
- name: "G\xE9raldine Piot"
- name: Mina Hojat Ansari
orcid: 0000-0002-3602-7884
- name: AuBi (Auvergne Bioinformatique)
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Changelog


## [0.1] - 2026-01-15

- First release
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Metagenomic Community Profiling

This workflow performs taxonomic profiling on metagenomic short-read quality-controlled and host/contaminant removed data using multiple state-of-the-art tools, standardizes the generated outputs, and generates visualizations.

## Inputs

- Paired collection of **FastQ files** containing **metagenomic short-read data** after quality control and host/contamination removal
- **Reference databases** for the different taxonomy profiling tools

## Workflow Overview

1. **Taxonomy Profiling** using

- **Kraken2** (k-mer approach) with abundance re-estimation with **Bracken**
- **MetaPhlAn** (marker-based approach)
- **sylph** (k-mer approach)

2. **Standardization** of Kraken2/Bracken and MetaPhlAn outputs using **TaxPasta**

3. **Visualization** using
- **Krona** to generate interactive, hierarchical plots for exploring taxonomy profiles.
- **MultiQC** for aggregated HTML report for cross-sample and cross-tools comparisons

## Outputs

- Taxonomic profiles (Kraken2, Kraken2 + Bracken, MetaPhlAn, sylph)
- Standardized taxonomy tables (TaxPasta)
- Krona interactive plots
- MultiQC HTML report
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
- doc: Test with 1 sample for Metagenomics Taxonomic Community Profiling workflow
job:
Metagenomics Reads after Quality Control and Host/Contamination Removal:
class: Collection
collection_type: list:paired
elements:
- class: Collection
type: paired
identifier: PSM6XBT3_500k
elements:
- class: File
identifier: forward
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R1.fastq.gz
- class: File
identifier: reverse
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R2.fastq.gz
Reference Taxonomy Database for Kraken2: k2_minusb_20210517
Reference Taxonomy Database for Bracken: k2_minusb_20210517
Taxonomic Level for Abundance Re-estimation for Bracken: S
Reference Taxonomy Database for MetaPhlAn: mpa_vJan21_TOY_CHOCOPhlAnSGB_202103
Reference Database for Sylph: sylph_downloaded_12122025_OceanDNA-c200-v0.3.syldb
Reference Taxonomy Metadata for Sylph: sylph_tax_downloaded_08112025
NCBI Taxonomy for taxpasta: "2024-06-05"
outputs:
Kraken2 taxonomic profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 356
has_n_columns:
n: 6
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "D1"
MetaPhlAn taxonomy profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 22
has_text_matching:
expression: "#clade_name"
has_text_matching:
expression: "k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidales_unclassified|g__Phocaeicola|s__Phocaeicola_vulgatus|t__SGB1814"
has_text_matching:
expression: "2|976|200643|171549||909656|821|"
Sylph taxonomy profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 9
has_n_columns:
n: 2
has_text_matching:
expression: "d__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia flexneri"
has_text_matching:
expression: "PSM6XBT3_500k"
Bracken report:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 21
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "fraction_total_reads"
Standardized Kraken2/Bracken taxonomy profile:
asserts:
has_n_lines:
n: 21
has_n_columns:
n: 4
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "Escherichia coli"
Standardized MetaPhlAn taxonomy profile:
asserts:
has_n_lines:
n: 14
has_n_columns:
n: 4
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "Phocaeicola"
Krona chart for Kraken2 community profile:
asserts:
has_text_matching:
expression: "Bacteria"
has_text_matching:
expression: "Escherichia coli"
Krona chart for MetaPhlAn community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Phocaeicola"
Krona chart for Sylph community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Escherichia flexneri"
Krona chart for Kraken2/Bracken community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Faecalibacterium virus Brigit"
MultiQC HTML report:
asserts:
has_text_matching:
expression: "MetaPhlAn"
has_text_matching:
expression: "Kraken2"
has_text_matching:
expression: "Sylph"
has_text_matching:
expression: "Bracken"
has_text_matching:
expression: "Standardized"
- doc: Test with 2 samples for Metagenomics Taxonomic Community Profiling workflow
job:
Metagenomics Reads after Quality Control and Host/Contamination Removal:
class: Collection
collection_type: list:paired
elements:
- class: Collection
type: paired
identifier: PSM6XBT3_500k
elements:
- class: File
identifier: forward
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R1.fastq.gz
- class: File
identifier: reverse
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R2.fastq.gz
- class: Collection
type: paired
identifier: PSM6XBT3_500k_2
elements:
- class: File
identifier: forward
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R1.fastq.gz
- class: File
identifier: reverse
location: https://zenodo.org/records/17895994/files/PSM6XBT3_500k_R2.fastq.gz
Reference Taxonomy Database for Kraken2: k2_minusb_20210517
Reference Taxonomy Database for Bracken: k2_minusb_20210517
Taxonomic Level for Abundance Re-estimation for Bracken: S
Reference Taxonomy Database for MetaPhlAn: mpa_vJan21_TOY_CHOCOPhlAnSGB_202103
Reference Database for Sylph: sylph_downloaded_12122025_OceanDNA-c200-v0.3.syldb
Reference Taxonomy Metadata for Sylph: sylph_tax_downloaded_08112025
NCBI Taxonomy for taxpasta: "2024-06-05"
outputs:
Kraken2 taxonomic profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 356
has_n_columns:
n: 6
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "D1"
PSM6XBT3_500k_2:
asserts:
has_n_lines:
n: 356
has_n_columns:
n: 6
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "D1"
MetaPhlAn taxonomy profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 22
has_text_matching:
expression: "#clade_name"
has_text_matching:
expression: "k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidales_unclassified|g__Phocaeicola|s__Phocaeicola_vulgatus|t__SGB1814"
has_text_matching:
expression: "2|976|200643|171549||909656|821|"
PSM6XBT3_500k_2:
asserts:
has_n_lines:
n: 22
has_text_matching:
expression: "#clade_name"
has_text_matching:
expression: "k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidales_unclassified|g__Phocaeicola|s__Phocaeicola_vulgatus|t__SGB1814"
has_text_matching:
expression: "2|976|200643|171549||909656|821|"
Sylph taxonomy profile:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 9
has_n_columns:
n: 2
has_text_matching:
expression: "d__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia flexneri"
has_text_matching:
expression: "PSM6XBT3_500k"
PSM6XBT3_500k_2:
asserts:
has_n_lines:
n: 9
has_n_columns:
n: 2
has_text_matching:
expression: "d__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia flexneri"
has_text_matching:
expression: "PSM6XBT3_500k"
Bracken report:
element_tests:
PSM6XBT3_500k:
asserts:
has_n_lines:
n: 21
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "fraction_total_reads"
PSM6XBT3_500k_2:
asserts:
has_n_lines:
n: 21
has_text_matching:
expression: "Faecalibacterium virus Brigit"
has_text_matching:
expression: "fraction_total_reads"
Standardized Kraken2/Bracken taxonomy profile:
asserts:
has_n_lines:
n: 51
has_n_columns:
n: 5
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
has_text_matching:
expression: "Escherichia coli"
Standardized MetaPhlAn taxonomy profile:
asserts:
has_n_lines:
n: 21
has_n_columns:
n: 5
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
has_text_matching:
expression: "Phocaeicola"
Krona chart for Kraken2 community profile:
asserts:
has_text_matching:
expression: "Bacteria"
has_text_matching:
expression: "Proteobacteria"
Krona chart for MetaPhlAn community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Phocaeicola"
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
Krona chart for Sylph community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Escherichia flexneri"
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
Krona chart for Kraken2/Bracken community profile:
asserts:
has_text_matching:
expression: "Root"
has_text_matching:
expression: "Escherichia coli"
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
MultiQC HTML report:
asserts:
has_text_matching:
expression: "MetaPhlAn"
has_text_matching:
expression: "Kraken2"
has_text_matching:
expression: "Sylph"
has_text_matching:
expression: "Bracken"
has_text_matching:
expression: "Standardized"
has_text_matching:
expression: "PSM6XBT3_500k"
has_text_matching:
expression: "PSM6XBT3_500k_2"
Loading
Loading