Skip to content

Arittra95/DeeProMic

Repository files navigation

DeeProMic (DEEp learning based therapeutic PROtein classifier against MICroorganisms)

Image description here

A classifier to predict poteintial therapeutic targets against any microorganims.

Introduction

DeeProMic is a therapeutic protein classifier which has been trained on poteintial therapeutic targets/ proteins (for human only) of UniProt Reference Clusters 90 (UniRef90) and Uniref50. These proteins were characterized rigorously with Subtractive Proteomics methods (a computational method used to identify potential drug and vaccine targets by comparing a pathogen's proteome against a host's proteome).

It has been tested on various bacterial and eukaryotic proteins. DeeProMic takes two steps to identify therapeutic targets. First, it predicts the therapeutic proteins using an LSTM model. Finally, it charachterizes those proteins using Basic Local Alignment Search Tool (BLAST) against human proteome and essential proteins from Database of Essential Genes (DEG).

How to install?

Requirements:

  1. Miniconda

Installation example:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
  1. Operating system (OS): Built on Ubuntu 22.04 LTS and tested on Ubuntu 22.04 LTS/ 24.04 LTS. You can try it using other Linux or MacOS based OSs.

  2. You may need ProFeatX (if the provided profeatx dose not work in your system)

Download the files

cd /path/to/your/desired/directory
git clone https://github.com/Arittra95/DeeProMic.git

After downloading the files, you have to make a conda environment called "deepromic" using environment.yml . To do so, use these commands:

cd deepromic/
conda env create -f environment.yml -n deepromic
conda activate deepromic
unzip essential.zip
chmod +x profeatx
chmod +x deepromic.py

How to use?

Method 0- Use the Online version:

Go here: https://huggingface.co/spaces/Arittra/deepromic

Note: you may need to refresh/ restart several times to run the program.

Method 1- Easy way:

cd /path/to/your/deepromic/directory
conda activate deepromic

For Graphical user interface (GUI), run this:

streamlit run app.py

By default, You should see Deepromic by default if not then Open your browser and go to:

http://localhost:8501

For Command Line Interface (CLI), run this:

python deepromic.py -i <path_to_input_fasta> -o <path_to_output_directory>

Method 2- Professional way:

cd /path/to/your/deepromic/directory
nano ~/.bashrc   # or ~/.zshrc
export PATH="$PATH:/path/to/deeproMic/deepromic"
source ~/.bashrc   # or ~/.zshrc
conda activate deepromic
python deepromic.py -i <path_to_input_fasta> -o <path_to_output_directory>

or for GUI:

streamlit run app.py

Method 3- If you have docker then:

docker pull arittrabioinfo/deepromic-app
docker run -it --rm -p 8501:8501 arittrabioinfo/deepromic-app:latest

Then go to http://localhost:8501

Options:

Wellcome to DeeproMic! Please provide a protein fasta file as input.
usage: deepromic.py [-h] -i INPUT [-t THRESHOLD] [-o OUTPUT]

Description of your script.

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input protein FASTA file (sequences with short headers
                        are preferable)
  -t THRESHOLD, --threshold THRESHOLD
                        Threshold for Probability score to filter druggable
                        proteins (Probability_Class_1)
  -o OUTPUT, --output OUTPUT
                        Output directory for saving generated files

Explanation of the Output files:

probability_score.csv:

Probability score of each portein being therapeutic (Probability_Class_1) or non_therapeutic (Probability_Class_0).

filtered_sequences.csv:

proteins that have probability socre for Probability_Class_1 more than the -t THRESHODL value. If you do not provide any threshold value, it will run with default value default=0.5.

potential_targets.fasta:

Fasta sequences that contains the protein sequences of filtered_sequences.csv.

blast_against_essential_genes.tsv:

BLAST outputs of potential_targets.fasta that were aligned against human.fasta. Suggestion: Avoid targets that are homologues to human proteins.Please go through Diamond for further analysis.

blast_against_host.tsv:

BLAST outputs of potential_targets.fasta that were aligned against essential.fasta. Suggestion: Select targets that are homologues to essential proteins. Please go through Diamond for further analysis.

dde.csv:

Dipeptide Deviation from Expected Mean features (DDE) of the -i INPUT fasta file.

Explanation of other Output files outside the output directory:

Essential_gene_DB.dmnd:

Diamond database file of the essential.fasta.

host_protein_DB.dmnd:

Diamond database file of the human.fasta.

${input}_modified.fasta:

input fasta with modified/ short sequence headers.

πŸ“¦ DeeProMic Dataset Release

We’ve published the datasets used to train and test DeeProMic in our v1.0 release.

πŸ”— Download Links

🧠 About the Data

These datasets were used to train and evaluate the DeeProMic model, which focuses on therapeutic protein/ target classification.

For more details, see the full release notes here.