GitHub - Sreeya003/Multiclass-Classification-of-Liver-Disease: Multiclass clinical classification pipeline in R to segment liver disease stages using PCA-enhanced LDA, achieving 94.91% accuracy across 4 diagnostic categories.

🧪 Multiclass Classification of Liver Disease

🩺 Project Overview This repository contains a clinical analytics pipeline built in R to classify patients into one of four diagnostic categories based on biochemical and demographic markers. The project focuses on transforming raw clinical data into a high-precision diagnostic tool by combining traditional statistical modeling with advanced dimensionality reduction.

The final pipeline utilizes Linear Discriminant Analysis (LDA) on PCA-transformed features to achieve high classification accuracy across diverse liver disease stages.

🚀 Key Features & Methodology

🧹 Clinical Data Engineering

Preprocessing: Cleaned and encoded mixed-type demographic and biochemical data.

Outlier Management: Implemented 99th percentile capping to handle extreme clinical values without losing data integrity.

Feature Scaling: Applied Z-score normalization to standardize features for distance-based and variance-based algorithms.
📉 Feature Selection & Noise Reduction

Multicollinearity Control: Applied Principal Component Analysis (PCA) to handle highly correlated biochemical markers.

Biomarker Identification: Used ANOVA and effect-size analysis to validate high-signal biomarkers, identifying AST, BIL, and GGT as the most significant drivers of patient segmentation.
🤖 Multiclass Modeling & Performance

Models Evaluated: Compared Multinomial Logistic Regression (MLR), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA).

Champion Model: The LDA model (using PCA features) outperformed others with:

✅ Accuracy: 94.91%

🎯 Macro F1-Score: 0.79

🛠️ Technical Stack Language: R

Libraries: tidyverse, caret, MASS (LDA/QDA), FactoMineR (PCA)

Techniques: PCA, ANOVA, Outlier Capping, Z-score Scaling, Multiclass Classification

📂 Repository Structure scripts/: R scripts for preprocessing, PCA, and model training.

analysis/: ANOVA results and biomarker effect-size reports.

results/: Confusion matrices and accuracy benchmarks.

🏁 How to Use 📥 Load your liver disease dataset into the R environment.

⚙️ Run the preprocessing script to handle outliers and scaling.

📉 Execute the PCA script to generate principal components.

🏆 Run the modeling script to train the LDA classifier and view performance metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ASDS_5303_final_project.Rmd		ASDS_5303_final_project.Rmd
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

License

Sreeya003/Multiclass-Classification-of-Liver-Disease

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages