Skip to content

Results

Kristy Horan edited this page Jun 1, 2023 · 5 revisions

Outputfiles

bohra outputs sequence level and dataset level files

Per sequence files

All results for each sequence are in a <sequence_id> folder in the directory you ran bohra in.

Filename Pipeline Description
read_assessment.txt all combined output of seqkit stats, seqkit fx2tab, seqtk fqchk
kraken2.tab all raw output of kraken2
species.txt all summary of top 3 species from kraken2.tab
snippy_qc.txt snps, phylogeny, default, full summary of snippy output
snps.* snps, phylogeny, default, full snippy outputs .vcf, .log, .fa
mlst.txt amr_typing,full,default results of MLST
typer.txt amr_typing,full,default results of the typer - where appropriate
resistome.txt amr_typing,full,default collated results of the abritamr
abritamr amr_typing,full,default raw results of abritamr
plasmid.txt amr_typing,full,default collated results of the mbb_suite
plasmid_*.fa amr_typing,full,default fasta files of plasmids identfied
<seq_ID>.txt assemble, amr_typing,full,default collated results of the prokka
<seq_ID>.gff assemble, amr_typing,full,default gff output of prokka
contigs.fa assemble, amr_typing,full,default the assembly generated by bohra OR provided by the user
assembly_statistics.txt assemble, amr_typing,full,default summary of assembly statistics

Overall dataset files

All dataset level results are stored in a folder called report. This folder contains collated versions of all the summary files found in the sample directories, as well as the tree (if phylogeny was performed) and also the outputs of panaroo if the full pipeline was run. It also contains a report_<pipeline>.html file, which is a standalone file that can be viewed in the browser and shared with collaborators or other stakeholders. A copy of one can be found [here](need to add a link). If you are going to rerun bohra continuously in a folder and want to keep a record of all the past report folders you can use the --keep Y flag, the report folder will be renamed and kept for archiving.

The work folder. What is it?

Nextflow is a workflow platform that allows you to run a pipeline - a series of processes or steps all tied together by their dependency on each other. For example - a tree can't be generated unless there is an multi-sequence alignment of some sort, which requires alignments and so on back to reads and a reference. The bohra pipelines are modular and can be run independently or as a whole. You can also run a pipeline, go away and check it add or remove sequences and then come back and re-run - and only the new sequences will be added - the whole thing will not be re-run on all of the samples that already have results. Or run a preview pipeline and then come back and run the full pipeline - the early steps will not be repeated. This 'caching' of results is achieved by the work folder. When you run bohra a series of folders will be created in the location you run it, a whole lot of sample directories, a report directory and also a work directory. This work directory is where nextflow looks to find what has been performed already. It holds log files of each process that was run and also raw results. You can delete this folder if you want - there will be no explosions!! Your sample directories and report folders will all be fine. But if you want to rerun the same analysis for any reason - there will be no 'memory' of previous runs - so the whole pipeline will run again.

Clone this wiki locally