Naive Bayes Classifier | 10-Fold Cross-Validation | ROC Curve Analysis | Machine Learning | Python
This repo contains a Jupyter Notebook that implements the Gaussian Naïve Bayes algorithm from scratch to perform binary classification on the famous Iris dataset 🌸. The dataset consists of three types of iris flowers: Setosa, Versicolor, and Virginica.
- File Structure 📂
- Requirements
- Installation Guide 🛠
- Dataset Information
- Naïve Bayes Algorithm
- Display 📷
- Key Findings 📈
- Contributing 🚀
📦 Naive-Bayes repo
|-- 📜 Img
| |-- 📜 1.png
| |-- 📜 2.png
| |-- 📜 3.png
│-- 📜 Naive_Bayes.ipynb # Jupyter Notebook with implementation
│-- 📜 requirements.txt # List of dependencies
│-- 📜 iris.csv # Dataset (Iris Flower Dataset)
│-- 📜 README.md # Project documentation
- Python Version: 3.10 or higher
- External Dependencies: Managed through
requirements.txt - Jupter Notebook for the web framework
- Numpy
- Panda
Follow the steps below to set up and run the project:
git clone https://github.com/adexoxo13/Naive-Bayes.git
cd Naive-Bayesconda create --name <my-env>
# When conda asks you to proceed, type y:
proceed ([y]/n)?
#Verify that the new environment was installed correctly:
conda env list
#Activate the new environment:
conda activate myenvpip install -r requirements.txtjupyter notebookOpen Naive_Bayes.ipynb in Jupyter and run the cells to see the model in action.
The Iris Dataset consists of 150 samples, with the following attributes:
| Feature | Description |
|---|---|
| Sepal Length | Length of the sepal (cm) |
| Sepal Width | Width of the sepal (cm) |
| Petal Length | Length of the petal (cm) |
| Petal Width | Width of the petal (cm) |
| Species | Type of Iris Flower (Target) |
Naïve Bayes is a probabilistic classifier based on Bayes' Theorem. It is widely used for text classification, spam filtering, and medical diagnosis. Given an input feature set, it calculates the probability of each class and selects the one with the highest probability.
- P(A|B) = (P(B|A) * P(A)) / P(B)
📌 Data Visualization:
# Example Plot
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(data, hue="species")
plt.show()This will generate scatter plots of the Iris dataset.
-
Best Performance:
Setosa vs Versicolor classifications show near-perfect separation
Setosa vsVirginica classifications show near-perfect separation
Setosa vsVirginica classifications show near-perfect separation
-
Most Challenging:
Versicolor vs Virginica classification demonstrates overlap
-
Model Accuracy:
Average AUC of 1 for Setosa vs Versicolor and Setosa vs Virginica
Consistent performance across cross-validation folds
Average AUC of 0.97 ± 0.03 for Versicolor vs Virginica
Currently implements Gaussian Naive Bayes only
Assumes feature independence (naive assumption)
Limited to binary classification scenarios
Add multiclass classification support
Implement different probability distributions
Include feature correlation handling
Add hyperparameter tuning capabilities
Expand to other datasets
Contributions are welcome! Feel free to fork the repository and submit a pull request.
Feel free to reach out or connect with me:
- 📧 Email: adenabrehama@gmail.com
- 💼 LinkedIn: linkedin.com/in/aden
- 🎨 CodePen: codepen.io/adexoxo
📌 Star this repository if you found it useful! ⭐


