This pipeline can perform species detection and phylogeny analysis of Heptatits A virus (HAV). Illumina paired-end sequencing data are required for the pipeline. The phylogenetic relationship is built based on SNP sequences. 17 HAV samples from NCBI are used as references to analyze phylogenetic relationship of the test samples.
Nextflow is needed. The details of installation can be found at https://github.com/nextflow-io/nextflow. For HiPerGator users, its installation is not needed.
Singularity/APPTAINER is needed. The details of installation can be found at https://singularity-tutorial.github.io/01-installation/. For HiPerGator users, its installation is not needed.
SLURM is needed. For HiPerGator users, its installation is not needed.
Python3 is needed. The packages "pandas" and "biopython" should be installed by pip3 install pandas/biopython if not included in your python3.
The Kraken2 database PlusPF is needed. For HiPerGator users, downloading is not needed. It has been downloaded and configured in the pipeline.
%%{ init: { 'gitGraph': { 'mainBranchName': 'Daytona_HAV' } } }%%
%%{init: { 'themeVariables': { 'commitLabelFontSize': '20px' } } }%%
%%{init: { 'themeVariables': { 'fontSize': '24px' } } }%%
gitGraph
commit id: "QC"
branch QC
checkout QC
commit id: "Fastqc"
commit id: "Trimmomatic"
commit id: "bbtools"
commit id: "multiqc"
checkout Daytona_HAV
merge QC
commit id: "SNP calling"
branch SNP_calling
checkout SNP_calling
commit id: "bwa"
commit id: "samtools"
commit id: "ivar"
checkout Daytona_HAV
merge SNP_calling
commit id: "Consensus"
branch Consensus
checkout Consensus
commit id: "extract_kraken_reads"
commit id:"bwa for extract reads"
commit id:"samtools for extract reads"
commit id:"ivar for extract reads"
checkout Daytona_HAV
merge Consensus
commit id: "phylogeny"
branch Phylogeny
checkout Phylogeny
commit id: "mafft with 17 references"
commit id: "snp-sites with 17 references"
commit id: "iqtree with 17 references"
commit id: "phytreeviz with 17 references"
checkout Daytona_HAV
merge Phylogeny
conda create -n HAV -c conda-forge python=3.10conda activate HAV- Put your data files into the directory /fastqs/hav/. Your data file's name should look like "XZA22002292_1.fastq.gz", "XZA22002292_2.fastq.gz". You may use the script rename.sh to rename your data files.
Note: Do not place fastq data in any location other than the fastq/hav/ directory within the Daytona_HAV pipeline folder. Placing data in other locations will cause program errors.
-
Open the file "params_hcv.yaml", and set the parameters absolute paths. They should be ".../.../fastqs/hav", ".../.../output", etc.
-
Get into the top directory of the pipeline and then run the following command.
sbatch Daytona_HAV.sh # generate phylogenetic tree| sampleID | species/tax_ID/percent(%)/number | ... |
|---|---|---|
| xxx25002686_S1 | Hepatitis A/12092/93.58/123349 | ... |
The second column of the above table indicates that 123349 reads 93.58%) in the sample (xxx25002686_S1) are identified as HAV species.
| REGION | POS | REF | ALT | ... | PVAL | PASS | ... |
|---|---|---|---|---|---|---|---|
| NC_001489.1 | 2895 | T | G | ... | 0.526316 | FALSE | ... |
| NC_001489.1 | 2927 | T | C | ... | 0 | TRUE | ... |
Note: PASS is the result of p-value <= 0.05. If a SNP's PASS value is FALSE, it fails to pass the quality check.
For the phylogenetic tree, the test samples will be compared with 17 HAV references. The phylogenetic bootstrap test with 1,000 replicate datasets will be performed to assess the statistical support for nodes (branches) on the phylogenetic tree.
To detect a test sample's HAV genotype, a possible approach is to add HAV samples with known genotypes as positive controls and construct a phylogenetic tree together with the test samples.
Test data can be found in /blue/bphl-florida/share/Daytona_HAV_test_sample. To use them, please copy them to the directory .../fastqs/hav/.
Also, the results of these test data can be found in in /blue/bphl-florida/share/Daytona_HAV_test_sample/output-20251009215948.
If you want to report bugs, suggest enhancements, discuss ideas related to the project, please use the repository's "Issues" tab in GitHub.
If you want to get email notification when the pipeline running ends, please input your email address in the line "#SBATCH --mail-user=" of Daytona_HAV.sh.