This repository contains the datasets, machine learning code, and visualization tools used in the study:
"Data-driven insight into the universal structure-property relationship of catalysts in lithium-sulfur batteries"
Published in Journal of the American Chemical Society (2025).
Corresponding authors: Guangmin Zhou, Xuan Zhang, Tianshuai Wang
The sulfur reduction reaction (SRR) is a critical step in lithium–sulfur batteries, yet its catalytic mechanisms remain poorly understood. Existing DFT methods are limited by cost and specificity. To address this, we develop a data-driven framework that extracts universal structure–property relationships (UQSPRs) from a large-scale, heterogeneous dataset and enables rapid prediction and discovery of effective catalysts using machine learning.
Our dataset spans 20 years (2004–2024) and is constructed by detecting from over 2,900 peer-reviewed studies. It contains 481 data points, covering diverse transition metal compounds and their interactions with five representative polysulfide species. This diversity enables robust and generalizable model learning.
- Built the first high-quality adsorption energy dataset for SRR catalysts based on literature mining.
- Proposed a geometric descriptor (dispersion factor) that predicts catalytic activity, in contrast to traditional electronic state analysis frameworks.
- Trained a collaborative machine learning model using random forests and feature screening.
- Screened 374,833 materials and experimentally validated CrB₂ as a high-performance catalyst.
This repository provides the complete codebase and data files to reproduce the machine learning framework for predicting the catalytic activity of Li–S battery catalysts.
├── dataset.xlsx # Final dataset with 14 features and adsorption energy
├── candidates_from_expert.xlsx # Expert-selected feature candidates
├── Training_Testing_Data.xlsx # Pre-defined train/test split for model training
│
├── Main.m # Main script to run model training and prediction
├── Candidate_Features.m # Feature construction from raw materials data
├── Rules_candidates.m # Apply selection rules to reduce feature space
├── SplitData.m # Data splitting into training and validation sets
├── RF.m # Random Forest training with bootstrap strategy
├── SelectModels.m # Aggregates top expert models for final prediction
├── PredictValidation.m # Predicts on validation data using trained ensemble
├── calculate_r2.m # R² score calculation
├── VisualPrediction.m # MATLAB-based result visualization
│
├── Python scripts (optional) # For plotting and result analysis
│ ├── Feature_Correlation_Heatmap.py
│ ├── Ead_Distributions.py
│ ├── Prediction_Errors.py
│ └── Prediction_Results.py
-
Prepare Data
Ensuredataset.xlsxis formatted with 14 descriptors and adsorption energy as target. You may use your own dataset following the same format. -
Run the Main Pipeline
LaunchMain.min MATLAB. This script:- Loads the dataset
- Splits it using
SplitData.m - Trains multiple random forest models via
RF.m - Aggregates top models in
SelectModels.m - Predicts adsorption energy using
PredictValidation.m
-
Evaluate Model Performance
Usecalculate_r2.mto compute accuracy metrics. Visualization is supported via:VisualPrediction.m(MATLAB)- or Python scripts for advanced plotting (
Prediction_Results.py, etc.)
-
Screen New Materials
Replace the input dataset with new candidate features and repeat the above steps. Predictions will guide high-throughput catalyst selection.
Python scripts provide additional tools for performance analysis:
Feature_Correlation_Heatmap.py: Correlation between structural and electronic features.Prediction_Errors.py: Parity and error plots.Prediction_Results.py: Visualizes top-performing catalyst predictions.Ead_Distributions.py: Adsorption energy distributions across LiPS species.
- Version: R2023a or later
- Toolboxes: Statistics and Machine Learning Toolbox
- Python ≥ 3.8
- Required packages:
pandasmatplotlibseabornscikit-learnnumpy
Install via pip:
pip install pandas matplotlib seaborn scikit-learn numpyIf you use this repository, please cite:
@article{Han2025,
title = {Data-driven insight into the universal structure–property relationship of catalysts in lithium–sulfur batteries},
author = {Han, Zhiyuan and Tao, Shengyu and Jia, Yeyang and Zhang, Mengtian and Ma, Ruifei and Xiao, Xiao and Zhou, Jiaqi and Gao, Runhua and Cui, Kai and Wang, Tianshuai and Zhang, Xuan and Zhou, Guangmin},
journal = {Journal of the American Chemical Society},
year = {2025},
note = {Accepted, in press}
}For questions or collaborations, please contact:
- Shengyu Tao or the corresponding authors.
MIT License applied.