Instructions for pipeline testing

Migrating this from an email thread for easier viewing.

# Scope of Testing

For now, we will be testing the steps in the figure below between "NEON Data Products" and "ASV tables with taxonomy," except that we will not be generating the taxonomy tables because this takes too much processing time.

![Screen Shot 2020-10-26 at 12 42 36 PM](https://user-images.githubusercontent.com/12421420/98305657-265b0280-1f77-11eb-8ecc-e04a71511a6f.png)

Our Technical Working Group has suggested that this testing should occur in two phases. In Phase 1, we test the pipeline to ensure that the pipeline is simply able to run from start to finish on a variety of operating systems. In Phase 2, we will ask volunteers to read through the docs and provide suggestions on how to make the package more flexible and user-friendly. For now, we are only asking you to conduct Phase 1 testing.

# Instructions for Phase 1 Testing

Start by pulling the codebase from [https://github.com/claraqin/NEON_soil_microbe_processing](https://github.com/claraqin/NEON_soil_microbe_processing).

* I recommend using git clone, e.g. "git clone https://github.com/claraqin/NEON_soil_microbe_processing.git"

Install **cutadapt** if you have not previously done so.

* Installation instructions can be found here: [https://cutadapt.readthedocs.io/en/stable/installation.html](https://cutadapt.readthedocs.io/en/stable/installation.html)
* This is where many people run into issues because of Python dependencies. If you cannot install cutadapt, then ignore the ITS pipeline and test the 16S processing pipeline only. (The 16S pipeline does not require cutadapt.)

Update the parameters in the "params.R" file, which can be found in the "code" subdirectory. 

* Most of the parameters will not need to be updated because they are either adaptable or will not be referenced in this scope of testing.
* However, you may need to update the CUTADAPT_PATH parameter if you are testing the ITS pipeline.
* If you are using a Mac, you may also wish to update the MULTITHREAD parameter. By default, multithreading is turned off for Windows computers.

Download the sequence metadata for testing at [this Google Drive link](https://drive.google.com/file/d/1Rd41yHeHYoNl5HsKWUTO2uFOSLrbcIg9/view?usp=sharing), decompress the zipfile, and drop the contents (two files) into the project directory (the base directory of the repository that you just cloned).

* In the future, this step will be replaced by a function made specifically for downloading sequence metadata from NEON. But for now, we need to use a workaround because of compatibility issues on NEON's end which will be resolved later this year.

The code for testing can be found in the "testing" subdirectory. This subdirectory contains temporary versions of our vignettes that I made for testing only. Start with the download-neon-data-metadataworkaround.Rmd vignette. 

* You will probably have to update the "root.dir" RMarkdown parameter at the top of the script. It should refer to the absolute filepath of the project root directory (e.g. .../neonSoilMicrobeProcessing).
* Note that the R package dependencies, specified in lines 32-36, must be installed before this vignette will run properly.
* In lines 81-82, you will have the option to download either the metadata for ITS sequences or the metadata for 16S sequences (or both). **Please respond to this Issue thread to let the other testers know which target gene(s) you will test.**
* In lines 89-101, different options of subsetting parameters are provided. You could attempt to download and process the entire dataset if you'd like, but I do not even have an estimate of the full download size because these metadata tables include both published and pre-published NEON data. **If you do subset the data, please respond to this Issue thread to let the other testers know which subset(s) you will test.**

Then move to either the process-its-sequence-to-seqtabs.Rmd or process-16s-sequence-to-seqtabs.Rmd vignettes, depending on which subset of the data you selected.

* You will probably have to update the "root.dir" RMarkdown parameter at the top of the script. It should refer to the absolute filepath of the project root directory (e.g. .../neonSoilMicrobeProcessing).
* Note that the R package dependencies, specified in lines 30-34, must be installed before this vignette will run properly.
* Both vignettes contain a header which says "All code below is NOT run in this version of the vignette." Please run only the code above this header.
* Note that each sequencing run (the unit by which we are subsetting) takes anywhere between 1 and 4 hours to process, depending on the size of the run and the speed of your processor. I've found that 8 GB of RAM is usually sufficient for running this pipeline, but occasionally more RAM is needed.

# Reporting Back

If any issues or fatal errors arise, please let me know by replying to me individually (unless of course it seems obvious that it would affect all testers).

Whether you run into a fatal error or are able to complete the pipeline error-free, please report back on this thread and include in your post the output of `devtools::session_info()`.

# Current Volunteer Assignments

* Kabir has tested the ITS pipeline on a Mac for the following subset of data: `c("B69PP", "B69RF", "B69RN", "B9994", "BDR3T", "BF8M2", "BF8W6", "BFDG8", "BMCBD", "BMCC4", "BNBWL")`. 
* Dan is currently testing the 16S pipeline on a Mac for the following subset of data: `c("B69PP", "B69RF", "B69RN", "B9994", "BDNB6", "BF462", "BF8M2", "BFDG8", "BJ8RK", "BMC64", "BMCBJ"`.
* Kai is currently testing the 16 pipeline on a Windows VM and printing the results in this Issue thread: #26 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructions for pipeline testing #27

Scope of Testing

Instructions for Phase 1 Testing

Reporting Back

Current Volunteer Assignments

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Instructions for pipeline testing #27

Description

Scope of Testing

Instructions for Phase 1 Testing

Reporting Back

Current Volunteer Assignments

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions