A classifier to predict poteintial therapeutic targets against any microorganims.
DeeProMic is a therapeutic protein classifier which has been trained on poteintial therapeutic targets/ proteins (for human only) of UniProt Reference Clusters 90 (UniRef90) and Uniref50. These proteins were characterized rigorously with Subtractive Proteomics methods (a computational method used to identify potential drug and vaccine targets by comparing a pathogen's proteome against a host's proteome).
It has been tested on various bacterial and eukaryotic proteins. DeeProMic takes two steps to identify therapeutic targets. First, it predicts the therapeutic proteins using an LSTM model. Finally, it charachterizes those proteins using Basic Local Alignment Search Tool (BLAST) against human proteome and essential proteins from Database of Essential Genes (DEG).
Requirements:
Installation example:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
-
Operating system (OS): Built on Ubuntu 22.04 LTS and tested on Ubuntu 22.04 LTS/ 24.04 LTS. You can try it using other Linux or MacOS based OSs.
-
You may need ProFeatX (if the provided profeatx dose not work in your system)
Download the files
cd /path/to/your/desired/directory
git clone https://github.com/Arittra95/DeeProMic.git
After downloading the files, you have to make a conda environment called "deepromic" using environment.yml . To do so, use these commands:
cd deepromic/
conda env create -f environment.yml -n deepromic
conda activate deepromic
unzip essential.zip
chmod +x profeatx
chmod +x deepromic.py
Go here: https://huggingface.co/spaces/Arittra/deepromic
cd /path/to/your/deepromic/directory
conda activate deepromicstreamlit run app.pyBy default, You should see Deepromic by default if not then Open your browser and go to:
http://localhost:8501python deepromic.py -i <path_to_input_fasta> -o <path_to_output_directory>cd /path/to/your/deepromic/directory
nano ~/.bashrc # or ~/.zshrc
export PATH="$PATH:/path/to/deeproMic/deepromic"
source ~/.bashrc # or ~/.zshrc
conda activate deepromic
python deepromic.py -i <path_to_input_fasta> -o <path_to_output_directory>
or for GUI:
streamlit run app.pydocker pull arittrabioinfo/deepromic-app
docker run -it --rm -p 8501:8501 arittrabioinfo/deepromic-app:latestThen go to http://localhost:8501
Wellcome to DeeproMic! Please provide a protein fasta file as input.
usage: deepromic.py [-h] -i INPUT [-t THRESHOLD] [-o OUTPUT]
Description of your script.
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input protein FASTA file (sequences with short headers
are preferable)
-t THRESHOLD, --threshold THRESHOLD
Threshold for Probability score to filter druggable
proteins (Probability_Class_1)
-o OUTPUT, --output OUTPUT
Output directory for saving generated files
Probability score of each portein being therapeutic (Probability_Class_1) or non_therapeutic (Probability_Class_0).
proteins that have probability socre for Probability_Class_1 more than the -t THRESHODL value. If you do not provide any threshold value, it will run with default value default=0.5.
Fasta sequences that contains the protein sequences of filtered_sequences.csv.
BLAST outputs of potential_targets.fasta that were aligned against human.fasta. Suggestion: Avoid targets that are homologues to human proteins.Please go through Diamond for further analysis.
BLAST outputs of potential_targets.fasta that were aligned against essential.fasta. Suggestion: Select targets that are homologues to essential proteins. Please go through Diamond for further analysis.
Dipeptide Deviation from Expected Mean features (DDE) of the -i INPUT fasta file.
Diamond database file of the essential.fasta.
Diamond database file of the human.fasta.
input fasta with modified/ short sequence headers.
Weβve published the datasets used to train and test DeeProMic in our v1.0 release.
- Test_dataset.csv β 1.64 MB
- Train_and_test_datast.csv β 583 MB
These datasets were used to train and evaluate the DeeProMic model, which focuses on therapeutic protein/ target classification.
For more details, see the full release notes here.
