Name	Name	Last commit message	Last commit date
parent directory ..
Dockerfile	Dockerfile
README.md	README.md

BBTools container

Main tool: BBTools

Code repository: https://sourceforge.net/projects/bbmap/

Additional tools:

samtools: 1.21
htslib: 1.21
sambamba: 1.0.1

Basic information on how to use this tool:

executable: *.sh
help: Program descriptions and options are shown when running the shell scripts with no parameters.
version: --version
description:

BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.

Additional information:

Script	Purpose	Comment
bbcms.sh	Performs error correction using a Count-Min Sketch	Intended for metagenome assembly assembly
bbcountunique.sh	Counts unique kmers in reads
bbduk.sh	Trims, filters or masks reads using kmers
bbmap.sh	Splice-aware aligner for short reads
bbmapskimmer.sh	BBMap version designed for high levels of multimapping
bbmask.sh	Masks references based on various things, such as sequence complexity
bbmerge.sh	Merges overlapping paired reads
bbmerge-auto.sh	Same as bbmerge, but tries to allocate all memory on the node	Use this version for kmer operations like extend
bbnorm.sh	Normalizes reads based on coverage	Mainly for use prior to single-cell assembly
bbsplit.sh	BBMap version that maps to multiple references simultaneously	Intended for decontamination; similar to Seal
bbversion.sh	Prints the version of BBTools
bbwrap.sh	Wraps BBMap to process many files using same reference	Saves time by loading the index only once
calctruequality.sh	Allows recalibration of quality scores from mapped reads	This generates the correction matrix; BBDuk does the recalibration
callgenes.sh	Fast prokaryotic gene caller	Integrated into BBSketch
callvariants.sh	Fast variant caller
callvariants2.sh	Same as callvariants.sh with the "multisample" flag
clumpify.sh	Shrinks compressed fastq files, and can remove duplicate reads	Also supports error correction
comparesketch.sh	Compares sketches locally, without using a sketch server
crossblock.sh	Alias for decontaminate.sh
cutgff.sh	Cuts out features defined by gff file	E.g, generates one fasta entry per gene from a gff and an assembly
cutprimers.sh	Cuts out subregions of ribosomes	Mainly for 16S analysis
decontaminate.sh	Pool-level decontamination for single-cell MDA-amplified genomes
dedupe.sh	Removes duplicate and fully-contained sequences	Can also be used to cluster 16S sequences
dedupe2.sh	Version of dedupe that supports more hash keys for greater sensitivity
dedupebymapping.sh	Deduplicates reads based on mapping coordinates
demuxbyname.sh	Demultiplexes based on sequences headers
filterbyname.sh	Filters based on sequence headers
filterbytaxa.sh	Filters sequences based on taxonomic classification	Used with NCBI datasets
filterbytile.sh	Removes reads that are in low quality areas on flowcell
filterqc.sh	Part of JGI's fastq filtering pipeline
filtersam.sh	Filters sam files to remove reads with multiple unsupported mismatches	Designed for NovaSeq
gitable.sh	Used to process NCBI taxonomy data
khist.sh	Alias for bbnorm.sh with flags for making a kmer frequency histogram
kmercountexact.sh	Counts kmers and produces a histogram	Uses more memory than BBNorm but allows exact counts
kmercountmulti.sh	Cardinality estimation over multiple kmer lengths	Uses LogLog; does not produce a histogram
mapPacBio.sh	BBMap version designed for PacBio or Nanopore reads	Reads longer than 5kbp get broken into 5kbp shreds
mergesketch.sh	Allows multiple sketches to be combined
msa.sh	Alignment tool	Used with cutprimers.sh to cut subsections out of 16s
mutate.sh	Generates synthetic genomes by randomly mutating the input
muxbyname.sh	Multiplex multiple files, renaming sequences based on input file name	Opposite of demuxbyname.sh
partition.sh	Splits a sequence file into multiple files
pileup.sh	Calculates coverage from sam files
plotflowcell.sh	Produces statistics about flowcell positions
processhi-c.sh	Custom trimming for hi-C reads	In development
randomreads.sh	Generates synthetic data from real genome reference	Highly customizable
readqc.sh	Short read quality report	Alternative to fastqc
reformat.sh	Converts sequence files to another format	Has many additional options, includes subsampling
rename.sh	Renames sequences in various ways, such as adding a prefix
repair.sh	Fixes broken pairing in fastq files
representative.sh	Makes a smaller subset of a reference dataset by eliminating redundancy	Designed for use with BBSketch output
rqcfilter2.sh	Filtering pipeline used at JGI	portal.nersc.gov/dna/microbial/assembly/bushnell/RQCFilterData.tar
seal.sh	Counts kmer matches between query and reference sequences
sendsketch.sh	Fast taxonomic classifier using webservers at JGI
shred.sh	Breaks sequences into shorter, fixed-length pieces
shuffle.sh	Randomly reorders input file	Crashes if input doesn't fit in memory
shuffle2.sh	Randomly reorders input file	Supports larger files, but output might be less random
sketch.sh	Makes reference sketches on a per-TaxID basis
sketchblacklist.sh	Makes sketch blacklists of common kmers
sortbyname.sh	Sorts sequences by name, length, quality, taxa, and other things
summarizequast.sh	Generates box plots for multiple quast reports
tadpipe.sh	Preprocessing and assembly pipeline using tadpole
tadpole.sh	Fast short read assembler
tadwrapper.sh	Runs Tadpole with multiple kmer lengths to select the best assembly
taxserver.sh	Starts taxonomy and sketch servers
testformat.sh	Determines if file is fasta, fastq, interleaved, etc. by reading first few lines
testformat2.sh	Generates extensive statistics by reading the full file
translate6frames.sh	Translates nucleotide sequence into amino acid sequence in all frames
vcf2gff.sh	Converts vcf format to gff format

Full documentation: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/

Example Usage

(adapted from /opt/bbmap/pipelines/covid/processCorona.sh)

Interleave a pair of FASTQ files for downstream processing:

reformat.sh \
    in1=${SAMPLE}_R1.fastq.gz \
    in2=${SAMPLE}_R2.fastq.gz \
    out=${SAMPLE}.fastq.gz

Split into SARS-CoV-2 and non-SARS-CoV-2 reads:

bbduk.sh ow -Xmx1g \
    in=${SAMPLE}.fq.gz \
    ref=REFERENCE.fasta \
    outm=${SAMPLE}_viral.fq.gz \
    outu=${SAMPLE}_nonviral.fq.gz \
    k=25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

BBTools container

Example Usage

FilesExpand file tree

39.25

Directory actions

More options

Directory actions

More options

Latest commit

History

39.25

Folders and files

parent directory

README.md

BBTools container

Example Usage