Skip to content

Decontamination References

Ash O'Farrell edited this page Nov 5, 2025 · 2 revisions

Note: This does not cover fastq cleaning using fastp, although it does happen in the WDL's decontamination task. By default, fastp cleans fastqs upstream of decontamination.

myco (all versions) uses clockwork for decontamination and variant calling. The way clockwork handles decontamination can be roughly summed as:

  1. Align your fastqs to a decontamination reference containing stuff that is decidedly not TB (human, NTM, etc)
  2. Get rid of anything that aligned too closely to that decontamination reference

The contents of your decontamination reference is worth some consideration. clockwork helpfully provides workflows for generating these references. Rather than regenerating these references every time the workflow is run, I maintain premade Docker images containing the decontamination reference in order to save on cloud compute costs. These Docker images are what the decontamination task of myco (all versions) will actually run inside, so you do not need to pass a decontamination reference to these tasks, as long as you are happy with the options I've provided.

What references are available, and what is the difference between them?

Generally speaking you will want to use clockwork-v0.12.5's decontamination reference. Please note this is intentionally not the same decontamination reference CDC's varpipe pipeline uses (as of 2024). Please see this documentation in the clockwork-wdl repo for more details, and my justification for not using the CDC version.

CDPH verbally agreed to use clockwork-v0.12.5 when I explained the issues with the CDC varpipe reference, so that is the default.

Why not pass in the decontamination reference as a WDL file input?

You will see a lot of advice online about how to make Docker images as small as possible. However, this advice is not universally applicable, especially when dealing with Terra. On Terra, to my knowledge, you are not meaningfully charged when downloading an image from Docker Hub, but you are charged for localizing gs:// files via WDL File inputs. Additionally, localization of files into VMs is relatively slow. Since every sample will create a VM that requires the decontamination reference, it appears to be cheaper and faster to bake these reference files into the Docker image.

What if I want to roll my own decontamination reference?

You can either roll your own Docker image and make that the Docker image myco's decontamination task runs in, or you can modify the code itself to take in the decontamination reference as a file. I strongly recommend going the Docker route, especially if you are running on Terra, because it is essentially free to download a Docker image with huge files in it as opposed to localizing huge files. My clockwork-wdl repo has multiple Dockerfiles you can use as a starting point for creating your own.

Clone this wiki locally