AI-RET: Reinforcement Learning-Based RET-Specific Molecular Design

This repository hosts the code, data, and documentation for the project:

Generative Model-Guided Discovery of Amino Quinoxaline Scaffold as a Novel RET Alteration Inhibitor with Exploration of the Cryptic Pocket

Authors

Surendra Kumar, Vinay Pogaku, Mi-Hyun Kim*

Affiliation:
Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoe-ro, Yeonsu-gu, Incheon, Republic of Korea

Corresponding author: kmh0515@gachon.ac.kr

Overview

This project implements a comprehensive workflow for RET-specific molecular design using:

Machine Learning Models: Classification and regression models for predicting kinase selectivity, phenotypic responses, and RET mutant activity
Reinforcement Learning: REINVENT v3.2-based molecular generation with single and multi-objective optimization
Cheminformatics Pipeline: KNIME workflows for data curation and post-processing

The workflow integrates predictive models with generative AI to design novel RET inhibitors with improved selectivity and potency profiles.

Requirements and Installation

This project requires several external tools and software modules. Please ensure the following are installed on your system:

Required Software

REINVENT v3.2
- Reinforcement learning framework for molecular generation
- Installation: Follow instructions at REINVENT GitHub
KNIME Analytics Platform
- For data curation and post-processing workflows
- Download from KNIME website
Schrödinger Software Suite (2020-4 or later)
- Requires a valid license
- Used for molecular modeling and docking studies
OpenEye Scientific Software
- Requires a valid license
- Used for molecular descriptor generation

Python Dependencies

The machine learning models require:

Python 3.7+
RDKit
scikit-learn
pandas, numpy
scipy
UMAP (optional, for clustering-based splits)

See individual model directories for specific requirements.

Repository Structure

AI_RET2
├── README.md                                                         # This file
├── CLASSIFICATION_README.md                                          # Classification models documentation
├── REGRESSION_README.md                                              # Regression models documentation
│
├── MULTI_KINASE_CLASSIFICATION_CODES_DATA/                           # Multi-kinase selectivity classification
│   ├── multitask_classifier_with_default_parameters_SINGLE_MODEL.py  # code for training/validation/testing
│   ├── kinase_ml_output/                                             # Multi-Kinase data for training/validation/testing
│   └── saved_model_19Nov.zip/                                        # Trained models
│
├── PHENOTYPES_CLASSIFICATION_CODES_DATA/                             # Phenotypic classification
│   ├── multitask_classifier_zscore_INDIVIDUAL_FILES.py               # code for training/validation/testing
│   ├── GDSC1_GDSC2_Combined_Dataset_With_SMILES.csv                  # Phenotype data for training/validation/testing
│   └── results_zscore_individual.zip/                                # Model outputs
│
├── RET_MUT_REGRESSION_CODES_DATA/                                    # RET G810R mutant regression
│   ├── train_model_randomized_MUT.py                                 # code for training/validation/testing
│   ├── RET_Mutant_Selected_Dataset_With_Additional_Information.csv   # RET_Mut data for training/validation/testing
│   └── RET_MUTANT_final_model_Hyper.pkl                              # Best trained model
│
├── SINGLE_OBJECTIVE_REINVENT_JSON/                                   # Single-objective RL configs
│   ├── JOB01/ through JOB05/
│   └── INCEPTIONS_USED_IN_SINGLE_OBJECTIVE.xlsx
│
├── MULTI_OBJECTIVE_REINVENT_JSON/                                   # Multi-objective RL configs
│   ├── JOB01/ through JOB05/
│   └── INCEPTIONS_USED_IN_SINGLE_OBJECTIVE.xlsx
│
├── KNIME_WORKFLOW/                                                  # KNIME Workflow
    └──  README.md

Dataset

To facilitate RET-MUT Model development, we provide a dataset:

File: RET_MUT_REGRESSION_CODES_DATA/RET_Mutant_Selected_Dataset_With_Additional_Information.csv

Machine Learning Models

This project includes three types of machine learning models:

1. Multi-Kinase Selectivity Classification

Predicts selectivity across multiple kinase targets using multi-task learning.

Algorithms: RandomForest, SVM, KNeighbors, ExtraTrees, GradientBoosting, LogisticRegression, GaussianNB
Features: ECFP_Counts (Extended Connectivity Fingerprints with count vectors), ECFP, MACCS, Avalon fingerprints
Split Types: Random, cross_validation, scaffold, butina, umap_clustering split

📖 See CLASSIFICATION_README.md for detailed documentation

2. Phenotypic Response Classification

Predicts drug sensitivity across multiple cell lines from GDSC1/GDSC2 datasets.

Algorithms: RandomForest, SVM, LogisticRegression, XGBoost
Features: ECFP, MACCS, Avalon fingerprints
Split Types: Random, scaffold, butina, umap_clustering split

📖 See CLASSIFICATION_README.md for detailed documentation

3. RET G810R Mutant Activity Regression

Predicts pIC50 values for the RET G810R mutant using comprehensive model evaluation.

Algorithms: RandomForest, ExtraTrees, ElasticNet
Features: ECFP, ECFP_Counts, Avalon, MACCS
Split Types: Random (multiple seeds), Scaffold, Butina, UMAP_Clustering

📖 See REGRESSION_README.md for detailed documentation

Molecular Generation using Reinforcement Learning

This repository includes multiple JSON configuration files for executing reinforcement learning jobs using REINVENT v3.2.

Single-Objective Optimization

Located in SINGLE_OBJECTIVE_REINVENT_JSON/

JOB01-JOB05: Different optimization strategies
Each job focuses on a single objective (e.g., RET potency, selectivity)

Multi-Objective Optimization

Located in MULTI_OBJECTIVE_REINVENT_JSON/

JOB01-JOB05: Simultaneous optimization of multiple objectives
Balances potency, selectivity, and drug-like properties

Running REINVENT Jobs

Install REINVENT v3.2 following the official instructions
Navigate to a job directory:
```
cd SINGLE_OBJECTIVE_REINVENT_JSON/JOB01
```
Run the job:
```
python input.py RL_config.json
```

⚠️ Important: Before running any job, update the file paths in the JSON configuration files for reading input and writing output.

Inception Files

Inception files containing known RET-specific ligands are included to guide the reinforcement learning model during training. The SMILES of these RET-specific ligands are already used in the REINVENT configuration files.

Post-Processing and Analysis

Two comprehensive KNIME workflows are provided for post-processing generated molecules under Single-Objective and Multi-Objective optimization:

Files:

KNIME_WORKFLOW/SINGLE_OBJECTIVE_MULTI_STAGE_FILTER.knwf
KNIME_WORKFLOW/MULTI_OBJECTIVE_MULTI_STAGE_FILTER.knwf

These workflows implement:

Molecular filtering and quality checks
Property calculations
Diversity analysis
Visualization of results

Note: Due to confidentiality, the SMILES structures of all de novo generated molecules are not publicly disclosed or included in this repository. The workflows demonstrate the exact procedure used in the study.

Contact Information

For any queries, please contact:

Mi-Hyun Kim (Principal Investigator): kmh0515@gachon.ac.kr
Surendra Kumar: surendramph@gmail.com

Acknowledgments

This work was supported by [funding information to be added].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-RET: Reinforcement Learning-Based RET-Specific Molecular Design

Authors

Table of Contents

Overview

Requirements and Installation

Required Software

Python Dependencies

Repository Structure

Dataset

Machine Learning Models

1. Multi-Kinase Selectivity Classification

2. Phenotypic Response Classification

3. RET G810R Mutant Activity Regression

Molecular Generation using Reinforcement Learning

Single-Objective Optimization

Multi-Objective Optimization

Running REINVENT Jobs

Inception Files

Post-Processing and Analysis

Contact Information

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
KNIME_WORKFLOW		KNIME_WORKFLOW
MULTI_KINASE_CLASSIFICATION_CODES_DATA		MULTI_KINASE_CLASSIFICATION_CODES_DATA
MULTI_OBJECTIVE_REINVENT_JSON		MULTI_OBJECTIVE_REINVENT_JSON
PHENOTYPES_CLASSIFICATION_CODES_DATA		PHENOTYPES_CLASSIFICATION_CODES_DATA
RET_MUT_REGRESSION_CODES_DATA		RET_MUT_REGRESSION_CODES_DATA
SINGLE_OBJECTIVE_REINVENT_JSON		SINGLE_OBJECTIVE_REINVENT_JSON
CLASSIFICATION_README.md		CLASSIFICATION_README.md
README.md		README.md
REGRESSION_README.md		REGRESSION_README.md

college-of-pharmacy-gachon-university/AI_RET2

Folders and files

Latest commit

History

Repository files navigation

AI-RET: Reinforcement Learning-Based RET-Specific Molecular Design

Authors

Table of Contents

Overview

Requirements and Installation

Required Software

Python Dependencies

Repository Structure

Dataset

Machine Learning Models

1. Multi-Kinase Selectivity Classification

2. Phenotypic Response Classification

3. RET G810R Mutant Activity Regression

Molecular Generation using Reinforcement Learning

Single-Objective Optimization

Multi-Objective Optimization

Running REINVENT Jobs

Inception Files

Post-Processing and Analysis

Contact Information

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages