Daytona_HAV

Introduction

This pipeline can perform species detection and phylogeny analysis of Heptatits A virus (HAV). Illumina paired-end sequencing data are required for the pipeline. The phylogenetic relationship is built based on SNP sequences. 17 HAV samples from NCBI are used as references to analyze phylogenetic relationship of the test samples.

Prerequisites

Nextflow is needed. The details of installation can be found at https://github.com/nextflow-io/nextflow. For HiPerGator users, its installation is not needed.

Singularity/APPTAINER is needed. The details of installation can be found at https://singularity-tutorial.github.io/01-installation/. For HiPerGator users, its installation is not needed.

SLURM is needed. For HiPerGator users, its installation is not needed.

Python3 is needed. The packages "pandas" and "biopython" should be installed by pip3 install pandas/biopython if not included in your python3.

The Kraken2 database PlusPF is needed. For HiPerGator users, downloading is not needed. It has been downloaded and configured in the pipeline.

Workflow

%%{ init: { 'gitGraph': { 'mainBranchName': 'Daytona_HAV' } } }%%
%%{init: { 'themeVariables': { 'commitLabelFontSize': '20px' } } }%%
%%{init: { 'themeVariables': { 'fontSize': '24px' } } }%%
gitGraph       
       commit id: "QC"
       branch QC
       checkout QC
       commit id: "Fastqc"
       commit id: "Trimmomatic"
       commit id: "bbtools"
       commit id: "multiqc"

       checkout Daytona_HAV
       merge QC

       commit id: "SNP calling"
       branch SNP_calling
       checkout SNP_calling
       commit id: "bwa"
       commit id: "samtools"
       commit id: "ivar"
       checkout Daytona_HAV
       merge SNP_calling 
       
       commit id: "Consensus"
       branch Consensus
       checkout Consensus
       commit id: "extract_kraken_reads"
       commit id:"bwa for extract reads"
       commit id:"samtools for extract reads"
       commit id:"ivar for extract reads"
       checkout Daytona_HAV
       merge Consensus

       commit id: "phylogeny"
       branch Phylogeny
       checkout Phylogeny
       commit id: "mafft with 17 references"
       commit id: "snp-sites with 17 references"
       commit id: "iqtree with 17 references"
       commit id: "phytreeviz with 17 references"
       checkout Daytona_HAV
       merge Phylogeny

Recommended conda environment installation

conda create -n HAV -c conda-forge python=3.10

conda activate HAV

How to run

Put your data files into the directory /fastqs/hav/. Your data file's name should look like "XZA22002292_1.fastq.gz", "XZA22002292_2.fastq.gz". You may use the script rename.sh to rename your data files.

Note: Do not place fastq data in any location other than the fastq/hav/ directory within the Daytona_HAV pipeline folder. Placing data in other locations will cause program errors.

Open the file "params_hcv.yaml", and set the parameters absolute paths. They should be ".../.../fastqs/hav", ".../.../output", etc.
Get into the top directory of the pipeline and then run the following command.

sbatch Daytona_HAV.sh   # generate phylogenetic tree

Main output

1. HAV reads detection

sampleID	species/tax_ID/percent(%)/number	...
xxx25002686_S1	Hepatitis A/12092/93.58/123349	...

The second column of the above table indicates that 123349 reads 93.58%) in the sample (xxx25002686_S1) are identified as HAV species.

2. Variants

REGION	POS	REF	ALT	...	PVAL	PASS	...
NC_001489.1	2895	T	G	...	0.526316	FALSE	...
NC_001489.1	2927	T	C	...	0	TRUE	...

Note: PASS is the result of p-value <= 0.05. If a SNP's PASS value is FALSE, it fails to pass the quality check.

3. Phylogenetic tree

For the phylogenetic tree, the test samples will be compared with 17 HAV references. The phylogenetic bootstrap test with 1,000 replicate datasets will be performed to assess the statistical support for nodes (branches) on the phylogenetic tree.

A possible approach to detect sample genotypes

To detect a test sample's HAV genotype, a possible approach is to add HAV samples with known genotypes as positive controls and construct a phylogenetic tree together with the test samples.

Test data

Test data can be found in /blue/bphl-florida/share/Daytona_HAV_test_sample. To use them, please copy them to the directory .../fastqs/hav/.
Also, the results of these test data can be found in in /blue/bphl-florida/share/Daytona_HAV_test_sample/output-20251009215948.

Contact

If you want to report bugs, suggest enhancements, discuss ideas related to the project, please use the repository's "Issues" tab in GitHub.

Note

If you want to get email notification when the pipeline running ends, please input your email address in the line "#SBATCH --mail-user=" of Daytona_HAV.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
fastqs/hav		fastqs/hav
modules		modules
reference		reference
Daytona_HAV.sh		Daytona_HAV.sh
LICENSE		LICENSE
README.md		README.md
braken_phy_hav.py		braken_phy_hav.py
extract_kraken_reads.py		extract_kraken_reads.py
hav.nf		hav.nf
params_hav.yaml		params_hav.yaml
rename.sh		rename.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daytona_HAV

Introduction

Prerequisites

Workflow

Recommended conda environment installation

How to run

Main output

1. HAV reads detection

2. Variants

3. Phylogenetic tree

A possible approach to detect sample genotypes

Test data

Contact

Note

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Daytona_HAV

Introduction

Prerequisites

Workflow

Recommended conda environment installation

How to run

Main output

1. HAV reads detection

2. Variants

3. Phylogenetic tree

A possible approach to detect sample genotypes

Test data

Contact

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages