Endothelial Cell RNA-seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching
A user-friendly bioinformatics workflow to take raw data produced by RNA sequencing to interpretable results. The workflow described here was performed using Ubuntu 20.04.2 LTS, a Linux distribution. A 64-bit machine with at least 32Gb RAM is recommended for the majority of the steps in the workflow.
The published protocol can be found under Chapter 29 of Angiogenesis Methods and Protocols 2022.
- Bioinformatics Workflow
- Software and R Packages
- Workspace Preparation
- Software Installation
- Raw Reads Download
- Reference Genome Download
- Begin!
The steps of the workflow are shown in the flowchart. The tools used are in yellow boxes, the data required/produced in white boxes and file formats in purple, blue, dark green, orange and grey boxes. Results obtained are in light green boxes.
Below is a list of the software and R packages used in the workflow with the corresponding URL.
The commands used in the workflow, as seen in software_downloads and pipeline_commands use relevant file paths. Throughout the workflow, when a path containing "user" is shown (e.g., /home/user/rnaseq_exp), "user" represents the user's name and should be replaced by it.
Key directories to be made prior to software installation and raw data download:
-
Change directory to 'user'
cd /home/user -
Make a new directory called 'rnaseq_exp'
mkdir rnaseq_exp
-
Change directory to 'rnaseq_exp'
cd rnaseq_exp -
Make new directories called 'output', 'raw_data', resources', 'programs'
mkdir output raw_data resources programs
The required software and R packages can be installed by following the commands in the files within the software_downloads directory.
Refer to section 3.2 of the published protocol for more information.
A publicly available HUVEC dataset was used from a published study Andrade J et al (2021) Control of endothelial quiescence by FOXO-regulated metabolites. Nat Cell Biol 23(4):413–423.
The raw data in FASTQ format was obtained from the European Nucleotide Archive project PRJNA679567. Select the 'Download All' button above the 'FASTQ FTP' column and save in the raw_data directory created above.
Follow the commands in pipeline_commands/1_raw_data_decompression.txt to decompress the raw read files.
Refer to section 3.2.9 of the published protocol for more information.
The reference genome in FASTA format and the annotation of the reference genome in GTF or GFF format are required.
Both can be obtained from Ensembl FTP Download via an FTP client.
See the figure below to download the required files. Save in the resources directory created above.
Follow the commands in pipeline_commands/2_ref_genome_anno_decompression.txt to decompress the genome files.
Refer to section 3.2.10 of the published protocol for more information.
Once the required software and R packages have been installed, the workspace created, the raw reads and genome files downloaded and decompressed the analysis can begin.
Follow the command files in pipeline_commands in conjunction with the published protocol to successfully complete the analysis.
The Notes section of the published protocol, as well as the main text comments on errors that may arise throughout the workflow. These may help with troubleshooting.


