Skip to content

BPHL-Molecular/Daytona_HAV

Repository files navigation

Daytona_HAV

Introduction

This pipeline can perform species detection and phylogeny analysis of Heptatits A virus (HAV). Illumina paired-end sequencing data are required for the pipeline. The phylogenetic relationship is built based on SNP sequences. 17 HAV samples from NCBI are used as references to analyze phylogenetic relationship of the test samples.

Prerequisites

Nextflow is needed. The details of installation can be found at https://github.com/nextflow-io/nextflow. For HiPerGator users, its installation is not needed.

Singularity/APPTAINER is needed. The details of installation can be found at https://singularity-tutorial.github.io/01-installation/. For HiPerGator users, its installation is not needed.

SLURM is needed. For HiPerGator users, its installation is not needed.

Python3 is needed. The packages "pandas" and "biopython" should be installed by pip3 install pandas/biopython if not included in your python3.

The Kraken2 database PlusPF is needed. For HiPerGator users, downloading is not needed. It has been downloaded and configured in the pipeline.

Workflow

%%{ init: { 'gitGraph': { 'mainBranchName': 'Daytona_HAV' } } }%%
%%{init: { 'themeVariables': { 'commitLabelFontSize': '20px' } } }%%
%%{init: { 'themeVariables': { 'fontSize': '24px' } } }%%
gitGraph       
       commit id: "QC"
       branch QC
       checkout QC
       commit id: "Fastqc"
       commit id: "Trimmomatic"
       commit id: "bbtools"
       commit id: "multiqc"

       checkout Daytona_HAV
       merge QC

       commit id: "SNP calling"
       branch SNP_calling
       checkout SNP_calling
       commit id: "bwa"
       commit id: "samtools"
       commit id: "ivar"
       checkout Daytona_HAV
       merge SNP_calling 
       
       commit id: "Consensus"
       branch Consensus
       checkout Consensus
       commit id: "extract_kraken_reads"
       commit id:"bwa for extract reads"
       commit id:"samtools for extract reads"
       commit id:"ivar for extract reads"
       checkout Daytona_HAV
       merge Consensus

       commit id: "phylogeny"
       branch Phylogeny
       checkout Phylogeny
       commit id: "mafft with 17 references"
       commit id: "snp-sites with 17 references"
       commit id: "iqtree with 17 references"
       commit id: "phytreeviz with 17 references"
       checkout Daytona_HAV
       merge Phylogeny
       
    
Loading

Recommended conda environment installation

conda create -n HAV -c conda-forge python=3.10
conda activate HAV

How to run

  1. Put your data files into the directory /fastqs/hav/. Your data file's name should look like "XZA22002292_1.fastq.gz", "XZA22002292_2.fastq.gz". You may use the script rename.sh to rename your data files.

Note: Do not place fastq data in any location other than the fastq/hav/ directory within the Daytona_HAV pipeline folder. Placing data in other locations will cause program errors.

  1. Open the file "params_hcv.yaml", and set the parameters absolute paths. They should be ".../.../fastqs/hav", ".../.../output", etc.

  2. Get into the top directory of the pipeline and then run the following command.

sbatch Daytona_HAV.sh   # generate phylogenetic tree

Main output

1. HAV reads detection

sampleID species/tax_ID/percent(%)/number ...
xxx25002686_S1 Hepatitis A/12092/93.58/123349 ...

The second column of the above table indicates that 123349 reads 93.58%) in the sample (xxx25002686_S1) are identified as HAV species.

2. Variants

REGION POS REF ALT ... PVAL PASS ...
NC_001489.1 2895 T G ... 0.526316 FALSE ...
NC_001489.1 2927 T C ... 0 TRUE ...

Note: PASS is the result of p-value <= 0.05. If a SNP's PASS value is FALSE, it fails to pass the quality check.

3. Phylogenetic tree

For the phylogenetic tree, the test samples will be compared with 17 HAV references. The phylogenetic bootstrap test with 1,000 replicate datasets will be performed to assess the statistical support for nodes (branches) on the phylogenetic tree.

hav tree

A possible approach to detect sample genotypes

To detect a test sample's HAV genotype, a possible approach is to add HAV samples with known genotypes as positive controls and construct a phylogenetic tree together with the test samples.

Test data

Test data can be found in /blue/bphl-florida/share/Daytona_HAV_test_sample. To use them, please copy them to the directory .../fastqs/hav/.
Also, the results of these test data can be found in in /blue/bphl-florida/share/Daytona_HAV_test_sample/output-20251009215948.

Contact

If you want to report bugs, suggest enhancements, discuss ideas related to the project, please use the repository's "Issues" tab in GitHub.

Note

If you want to get email notification when the pipeline running ends, please input your email address in the line "#SBATCH --mail-user=" of Daytona_HAV.sh.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors