Analysis accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution" http://biorxiv.org/content/early/2016/07/05/062117
This repository provides an analysis pipeline that reproduces the main results in the paper step-by-step.
The following software needs to be installed (and can be installed using apt-get).
build-essential
libhdf5-dev
libboost-all-dev
cmake-3.2
g++-4.9
gcc-4.9
python
python-pip
Most of these can be installed with apt-get. Cmake 3.2 can be installed from this ppa on ubuntu: ppa:george-edison55/cmake-3.x on ubuntu, and gcc/g++-4.9 from ppa:ubuntu-toolchain-r/test.
git clone https://github.com/govinda-kamath/HINGE-analyses.git
cd HINGE-analyses
git submodule foreach --recursive git submodule update --init
git submodule update --init --recursive
./build.sh
source setup.sh
# Optionally you can create a python virtual environment and then install the requirements
pip install -r requirements.txt
The python packages installed by the last line are the following.
- numpy
- ujson
- cython
- networkx
- matplotlib
- biopython
- bcbio-gff
- bcbio-nextgen
- colormap
- easydev
- forceatlas2
- jupyter
One may need to install matplotlib by installing the python-matplotlib package. On ubuntu the command to do this would be sudo apt-get build-dep python-matplotlib
All of these packages can be alternatively installed with sudo pip install <package>. While installing forceatlas2, one should make sure that the code is cython compiled to get a 10x improvement in speed. One explicit way to ensure that is by directly downloading the source from pypi and compiling the setup.py.
We also need both ascp and Aspera connect to speed up the downloads.
The results of Figure 2 in the paper can be reproduced using this notebook.
Here is a tutorial on one way to set up an ipython/jupyter notebook it on a remote server.
