Skip to content

Github page to compliment Chapter 29 'Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching' of Angiogenesis Methods and Protocols 2022. A user-friendly bioinformatics workflow to study RNA-seq data.

Notifications You must be signed in to change notification settings

vasc-bioinf/rnaseq_exp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Endothelial Cell RNA-seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching

A user-friendly bioinformatics workflow to take raw data produced by RNA sequencing to interpretable results. The workflow described here was performed using Ubuntu 20.04.2 LTS, a Linux distribution. A 64-bit machine with at least 32Gb RAM is recommended for the majority of the steps in the workflow.


The published protocol can be found under Chapter 29 of Angiogenesis Methods and Protocols 2022.


Table of Contents

  1. Bioinformatics Workflow
  2. Software and R Packages
  3. Workspace Preparation
  4. Software Installation
  5. Raw Reads Download
  6. Reference Genome Download
  7. Begin!

Bioinformatics Workflow

The steps of the workflow are shown in the flowchart. The tools used are in yellow boxes, the data required/produced in white boxes and file formats in purple, blue, dark green, orange and grey boxes. Results obtained are in light green boxes.



Software and R Packages

Below is a list of the software and R packages used in the workflow with the corresponding URL.


Software URL
Ubuntu https://ubuntu.com/
FastQC https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Cutadapt https://github.com/marcelm/cutadapt
STAR https://github.com/alexdobin/STAR
Qualimap http://qualimap.conesalab.org/
Subread (featureCounts) http://subread.sourceforge.net/
R https://www.r-project.org/
Rstudio https://www.rstudio.com/
DESeq2 https://bioconductor.org/packages/release/bioc/html/DESeq2.html
clusterProfiler https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
pathview http://www.bioconductor.org/packages/release/bioc/html/pathview.html
ReactomePA https://bioconductor.org/packages/release/bioc/html/ReactomePA.html
enrichplot https://bioconductor.org/packages/release/bioc/html/enrichplot.html
biomaRt https://bioconductor.org/packages/release/bioc/html/biomaRt.html
ggplot2 https://ggplot2.tidyverse.org/
GO http://geneontology.org/
KEGG https://www.genome.jp/kegg/
Reactome https://reactome.org/
GSEA https://www.gsea-msigdb.org/gsea/index.jsp

Workspace Preparation

The commands used in the workflow, as seen in software_downloads and pipeline_commands use relevant file paths. Throughout the workflow, when a path containing "user" is shown (e.g., /home/user/rnaseq_exp), "user" represents the user's name and should be replaced by it.

Key directories to be made prior to software installation and raw data download:

  1. Change directory to 'user'

    cd /home/user
  2. Make a new directory called 'rnaseq_exp'

    mkdir rnaseq_exp
  3. Change directory to 'rnaseq_exp'

    cd rnaseq_exp
  4. Make new directories called 'output', 'raw_data', resources', 'programs'

    mkdir output raw_data resources programs

Software Installation

The required software and R packages can be installed by following the commands in the files within the software_downloads directory.

Refer to section 3.2 of the published protocol for more information.


Raw Reads Download

A publicly available HUVEC dataset was used from a published study Andrade J et al (2021) Control of endothelial quiescence by FOXO-regulated metabolites. Nat Cell Biol 23(4):413–423.

The raw data in FASTQ format was obtained from the European Nucleotide Archive project PRJNA679567. Select the 'Download All' button above the 'FASTQ FTP' column and save in the raw_data directory created above.

Follow the commands in pipeline_commands/1_raw_data_decompression.txt to decompress the raw read files.

Refer to section 3.2.9 of the published protocol for more information.


Reference Genome Download


The reference genome in FASTA format and the annotation of the reference genome in GTF or GFF format are required.

Both can be obtained from Ensembl FTP Download via an FTP client.

See the figure below to download the required files. Save in the resources directory created above.

Follow the commands in pipeline_commands/2_ref_genome_anno_decompression.txt to decompress the genome files.

Refer to section 3.2.10 of the published protocol for more information.



Begin!

Once the required software and R packages have been installed, the workspace created, the raw reads and genome files downloaded and decompressed the analysis can begin.

Follow the command files in pipeline_commands in conjunction with the published protocol to successfully complete the analysis.

The Notes section of the published protocol, as well as the main text comments on errors that may arise throughout the workflow. These may help with troubleshooting.

(back to top)

About

Github page to compliment Chapter 29 'Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching' of Angiogenesis Methods and Protocols 2022. A user-friendly bioinformatics workflow to study RNA-seq data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages