This Nextflow pipeline identifies Mycobacterium abscessus complex (MABSC), predicts drug resistance, performs species identification, and assembles genomes from Illumina paired-end reads. It uses Docker/Singularity containers and generates comprehensive summary results with Erm(41) resistance gene detection.
- Nextflow 23.04.0+
- Singularity
- Conda
Paired-end FASTQ files placed under fastqs/:
sample1_1.fastq.gz
sample1_2.fastq.gz
sample2_1.fastq.gz
sample2_2.fastq.gz
For lab samples format as prefix-date-xxx-xx_S1_L001_R1_001.fastq.gz:
# Use resources/rename.sh to convert lab sample names
bash resources/rename.shVerify params.yaml has all required paths:
input_dir: "/path/to/fastqs"
output: "/path/to/output"
reference: "/path/to/reference.fasta"
erm41: "/path/to/erm41.fasta"
card_database: "/path/to/card_db"
kraken2_db: "/path/to/kraken2_db"
NTM_profiler_db: "/path/to/NTM-profiler"Ensure files follow naming convention:
sample1_1.fastq.gz & sample1_2.fastq.gz
sample2_1.fastq.gz & sample2_2.fastq.gz
conda env create -f resources/keywest.yml
conda activate keywestnextflow run main.nf -params-file params.yamlsbatch keywest.shoutput/
+-- {sample}/
¦ +-- fastp/ # Quality control reports
¦ +-- ntm_profiler/ # Species identification
¦ +-- kraken2/ # Taxonomic classification
¦ +-- bwa/ # Read alignment
¦ +-- samtools/ # BAM processing
¦ +-- unicycler/ # Genome assembly
¦ +-- hmmer/ # Erm(41) resistance gene analysis
¦ +-- stats/ # Coverage statistics
¦ +-- summary/ # Comprehensive CSV report
¦ +-- {sample}_sample_summary.csv
+-- multiqc_report.html # Aggregate quality report
- fastp - Quality control and trimming
- NTM-Profiler - Mycobacterial species identification
- Kraken2 - Taxonomic classification
- BWA - Read mapping to reference
- Samtools - BAM file processing
- Unicycler - Genome assembly
- HMMER - Erm(41) resistance gene detection + CARD BLAST
- Stats - Coverage and mapping statistics
- Summary - Comprehensive CSV report generation
- MultiQC - Aggregate quality reporting
- Reference genome: Mycobacterial reference (FASTA)
- Erm(41) sequences: Query genes for resistance detection
- CARD database: Comprehensive Antibiotic Resistance Database
- Kraken2 database: Taxonomic classification
- NTM-Profiler database: Mycobacterial species identification. (The NTM-Profiler database for BPHL use by HiperGator has been modified by author.)
(For Non-HiPerGator Users)
# CARD Database
wget https://card.mcmaster.ca/latest/data
tar -xf data
makeblastdb -in card_protein_homolog_model.fasta -dbtype prot -out card_db
# Kraken2 Database
wget https://genome-idx.s3.amazonaws.com/kraken/minikraken2_v2_8GB_201904.tgz
tar -xzf minikraken2_v2_8GB_201904.tgz
# NTM-Profiler
git clone https://github.com/jodyphelan/NTM-Profiler.git(For Non-HiPerGator Users)
Please install the following modules manually:
HMMER: https://github.com/EddyRivasLab/hmmer/tree/master
wget http://eddylab.org/software/hmmer/hmmer.tar.gz
tar zxf hmmer.tar.gz
cd hmmer-3.4
./configure --prefix /your/install/path
make && make installNCBI BLAST+: https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-*-x64-linux.tar.gz
tar -xzf ncbi-blast-*-x64-linux.tar.gzBWA-MEM2: https://github.com/bwa-mem2/bwa-mem2
git clone https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
make