- Repository Name
- Title of the Project
- Short Description of the Project
- Objectives of the Project
- Name of the Dataset
- Description of the Dataset
- Goal of the Project using this Dataset
- Why did we choose this dataset
- Size of dataset
- Algorithms which can be used as part of our investigation
- Expected Behaviors and Problem Handling
- Issues to focus on
- Project Requirements
- Usage Instructions in Local System
- Usage Instructions in Google Colab
- Authors
pneumonia-detection-in-chest-X-rays
Comparative Analysis of Image-Based and Feature-Based Approaches for Pneumonia Detection in Chest X-rays
This project focuses on detecting pneumonia from chest X-ray images using Advanced Machine Learning and Deep Learning techniques (Rajpurkar et al., 2017; Wang et al., 2017). By leveraging a comprehensive dataset, including annotated images of pneumonia and normal cases, we aim to develop and compare image-based and feature-based approaches. Our goal is to identify the most effective method for accurate and interpretable pneumonia detection, contributing to improved patient outcomes through early diagnosis and treatment. This model will classify patients based on their chest X-ray images as either having pneumonia (1) or not having pneumonia (0).
1. Image Analysis: Develop and evaluate deep learning models to classify chest X-rays directly. This approach leverages deep learning models, particularly Convolutional Neural Networks (CNNs), to perform end-to-end image classification. The models directly process raw chest X-ray images to classify them as normal or pneumonia.
2. Feature Analysis: Extract meaningful features from the images and use them to train and evaluate traditional machine learning models. In this approach, we first extract features from the chest X-ray images. These features are then used as inputs for traditional machine learning algorithms. The process includes steps such as feature extraction, selection, and transformation, followed by the application of machine learning techniques like Support Vector Machines (SVM), Random Forests.
The dataset used in this project is the Chest X-ray dataset considered from the Research paper named Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification.
Data Source: https://data.mendeley.com/datasets/rscbjbr9sj/2
Type of the Dataset: X-ray Images
The considered dataset has the following information for better reference:
- Separate folders to train and validate/test the model.
- Enough number of Chest X-ray images to train the model to detect and diagnose Pneumonia.
- The target variable for classification is whether patient has pneumonia or not.
The goal of this project is to conduct a comprehensive comparative analysis of image-based and feature-based approaches for pneumonia detection using chest X-ray images. By evaluating the performance, robustness, and interpretability of deep learning and traditional machine learning models, we aim to identify the most effective method for accurately classifying chest X-rays as normal or pneumonia. This comparison will provide valuable insights into the strengths and limitations of each approach, ultimately contributing to improved detection and diagnosis of pneumonia, which can enhance patient outcomes and survival rates.
We selected this dataset based on several factors. For more detailed information, please refer to the following:
- The dataset is extensive, providing a large number of images suitable for evaluating and training deep learning models.
- It aligns well with the project's objectives by offering a challenging and realistic scenario for developing an image classification model using deep learning, specifically for Chest X-ray images.
- The dataset is annotated with images of two different diseases, enabling the development of a binary-class classification model.
- It is publicly available, facilitating easy access for research and development purposes.
- Total images size = 1.27 GB
- Dataset has 2 folders:
- Train:
- Normal (without Pneumonia) = 1349 images
- Pneumonia = 3883 images
- Test:
- Normal (without Pneumonia) = 234 images
- Pneumonia = 390 images
- Deep Learning Algorithms
- Convolutional Neural Networks (CNNs)
- Traditional Machine Learning Algorithms
- Support Vector Machines (SVM)
- Random Forests
- Logistic Regression
- Decision Tree etc
- Optimization Techniques
- Local Search, Search Strategies, and Heuristics
- Classify Chest X-ray images with high accuracy.
- Handle variations in image quality, resolution, and orientation.
- Be robust to noise and artifacts in the images.
- Provide interpretable results.
- Improving model interpretability and explainability.
- Optimizing model performance on a held-out test set.
- Following AI Ethics and Data Safety practices.
- pillow
- opencv-python
- tensorflow
- torch
- torchvision
- pandas
- numpy
- jupyter
- notebook
- tqdm
- joblib
- scipy
- scikit-image
- scikit-learn
- pycaret
- starlette
- seaborn
- Clone using HTTPS
git clone https://github.com/kraviteja95usd/pneumonia-detection-in-chest-X-rays.git
OR -
- Clone using SSH
git clone git@github.com:kraviteja95usd/pneumonia-detection-in-chest-X-rays.git
OR -
- Clone using GitHub CLI
git clone gh repo clone kraviteja95usd/pneumonia-detection-in-chest-X-rays
- Switch inside the Project Directory
cd pneumonia-detection-in-chest-X-rays
- Install Requirements
pip3 install -r requirements.txt
- Switch inside the Code Directory
cd bin
- Open your terminal (Command Prompt in Windows OR Terminal in MacBook)
- Type any one of the below commands based on the software installed in your local system. You will notice a frontend webpage opened in the browser.
jupyter notebook
OR -
jupyter lab
-
Step-1:
- Click (Single click or double click whatever works) on the
Pneumonia_Detection_Preprocessing.ipynbfile. - You will notice the file opened.
- Click
Runbutton from the Menu bar and select the option of your interest (Run CellorRun Allbutton). - You can look at the execution results within the file and interpret accordingly.
- !!! IMPORTANT NOTE AND DO NOT MISS THIS !!!
Post execution of
Load the Excel file and fetch the maximum height and maximum width of all the imagessection from thePneumonia_Detection_Preprocessing.ipynbfile, goto thedatasetpath, copy the entirechest_xray_nrmfolder and again paste it. Now, rename the folder withchest_xray_nrm_padded. Then, go inside it. Append_paddedto all the folders inside them.
- !!! IMPORTANT NOTE AND DO NOT MISS THIS !!!
Post execution of
- Now come back to the
Pneumonia_Detection_Preprocessing.ipynbfile and proceed with the image padding section which is the last part of this file execution.
- Click (Single click or double click whatever works) on the
-
Step-2:
- Repeat Step-1 for the following files one after the other (from point-1 to point-4. You can ignore the IMPORTANT NOTE from this step).
Pneumonia_Detection_Feature_Extraction.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_First_Order_GLCM_and_GLDM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_GLRLM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_NGTDM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.First_Order_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLCM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLRLM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLDM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_NGTDM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.All_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.
- You can look at the execution results within the file and interpret accordingly.
- Repeat Step-1 for the following files one after the other (from point-1 to point-4. You can ignore the IMPORTANT NOTE from this step).
- Upload your
chest_xraydataset folder to your Google Drive with whatever the account you wish to open Google Colab. - Follow the same steps as above till switching to the
bindirectory. - Goto Google Colab.
- You will find an option to
Upload Notebook. - Step - 1:
- Upload the notebooks
Pneumonia_Detection_Preprocessing.ipynbandPneumonia_Detection_Feature_Extraction.ipynbfrom your laptop to Google Colab. - Goto
Pneumonia_Detection_Preprocessing.ipynb. If required, write 3 to 4 lines of code to load the dataset from Google Colab as needed. You should be able to get it. - Click on
Runoption and selectRun AllorRun Cellor any option of your interest. You will see the code running. - You can look at the execution results within the file and interpret accordingly.
- !!! IMPORTANT NOTE AND DO NOT MISS THIS !!!
Post execution of
Load the Excel file and fetch the maximum height and maximum width of all the imagessection from thePneumonia_Detection_Preprocessing.ipynbfile, goto thedatasetpath, copy the entirechest_xray_nrmfolder and again paste it. Now, rename the folder withchest_xray_nrm_padded. Then, go inside it. Append_paddedto all the folders inside them.
- !!! IMPORTANT NOTE AND DO NOT MISS THIS !!!
Post execution of
- Now come back to the
Pneumonia_Detection_Preprocessing.ipynbfile and proceed with the image padding section which is the last part of this file execution.
- Upload the notebooks
- Step - 2:
- Repeat Step-1 for the following files one after the other (from point-1 to point-4. You can ignore the IMPORTANT NOTE from this step).
Pneumonia_Detection_Feature_Extraction.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_First_Order_GLCM_and_GLDM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_GLRLM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Pneumonia_Detection_Feature_Extraction_NGTDM.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.First_Order_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLCM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLRLM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_GLDM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.Second_Order_NGTDM_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.All_Features_Classification.ipynb. Note that all the corresponding excel file will be generated in theimage_informationfolder.
- Repeat Step-1 for the following files one after the other (from point-1 to point-4. You can ignore the IMPORTANT NOTE from this step).
- You can look at the execution results within the file and interpret accordingly.
| Author | Contact Details |
|---|---|
| Prof. Dr. Soumi Ray | soumiray@sandiego.edu |
| Ravi Teja Kothuru | rkothuru@sandiego.edu |
| Abhay Srivastav | asrivastav@sandiego.edu |