Colormap-Project

The goal of this project is to re implement the methodology presented in the paper "CoLoRMap: Correcting Long Reads by Mapping short reads" by Ehsan Haghshenas, Faraz Hach, S. Cenk Sahinalp and Cedric Chauve

Dependencies

Entrez Direct: used for downloading reference sequence NC 000913
apt install ncbi-entrez-direct
BWA - see here for installation : https://github.com/lh3/bwa. This code expects BWA to be in $PATH
zlib.h
apt-get install zlib1g-dev
samtools
apt install samtools
Boost: C++ library used in the codebase to for store graphs,find connected components,run dijstra's shortest path etc.
apt-get install libboost-graph-dev
BLASR - used to align long reads to reference genome
apt install -y blasr

Usage

Ecoli Data Initialization

To download the Illumina short reads, PacBio long reads and reference genome for the Escherichia coli str. K-12 substr. MG1655 do the following:
cd ecoli
bash init_ecoli.sh

`Snakemake`

Input:

The main 4 parameter for Snakefile are

folder: The name of the target folder which contains (ex. test_data/)
<short reads 1>.fastq: The name of one of fastq files in folder (ex. ill_1.fastq)
<short_reads_1>.fastq: The name of the other fastq file in folder (ex. ill_2.fastq)
<long reads>.fasta: The name of the fasta file in folder (ex. pac.fasta)

The other parameters are

test_name: The suffix of the file containing the corrected long reads. More spefically, the corrected long reads will be stored in <folder>/lr_corr_<test_name>.fasta. This is an id that is intended to be used to distiguish the output files produced as the colormap.cpp is adjusted.
correct_singletons: When set to "no", then a short read $s$ which has been mapped to a long read $l$ and is not adjacent to any other short reads mapped to $l$ will not be used to correct $l$. Other wise such short reads will be used to correct $l$

Output:

This pipeline produces the file

`colormap.cpp`

This file can be used directly to correct long reads. It takes 2 command line arguments:

<long_reads>.fasta
this is just the relative path to the long reads which are
a "raw alignment file"

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
ecoli		ecoli
imgs		imgs
test_data		test_data
utils		utils
.gitignore		.gitignore
ColoRMap Results.pdf		ColoRMap Results.pdf
README.md		README.md
Snakefile		Snakefile
colormap		colormap
colormap.cpp		colormap.cpp
fast_colormap		fast_colormap
fast_colormap.cpp		fast_colormap.cpp
unclog.sh		unclog.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Colormap-Project

Dependencies

Usage

Ecoli Data Initialization

`Snakemake`

Input:

Output:

`colormap.cpp`

About

Uh oh!

Releases

Packages

Languages

will-murray/Colormap-Project

Folders and files

Latest commit

History

Repository files navigation

Colormap-Project

Dependencies

Usage

Ecoli Data Initialization

Snakemake

Input:

Output:

colormap.cpp

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Snakemake`

`colormap.cpp`

Packages