An unsupervised machine learning project that detects anomalies and outliers using the Isolation Forest algorithm, with visual comparison between original data and anomaly-marked data.
This project demonstrates anomaly detection using Isolation Forest, an ensemble-based algorithm designed to isolate anomalies instead of profiling normal data points. Anomalies are detected based on how easily they are separated from the rest of the data.
The project includes:
- Original dataset visualization
- Dataset used for training
- Final output with anomalies highlighted
- isolation_tree.ipynb — Main project notebook implementing Isolation Forest
- healthcase.csv — Dataset used for anomaly detection
- main_data.png — Visualization of the original dataset
- anomalies_marked.png — Data with detected anomalies highlighted
- README.md — Project documentation
- Python
- NumPy
- Pandas
- Matplotlib
- scikit-learn
- Jupyter Notebook
- Algorithm: Isolation Forest
- Learning Type: Unsupervised Learning
- Model Type: Tree-based ensemble
- Use Case: Anomaly and Outlier Detection
Visualization of the dataset before applying anomaly detection.
Detected anomalies are marked distinctly to show deviation from normal patterns.
- Clone the repository
git clone https://github.com/btboilerplate/Anomaly-detection-using-IsolationForest.git
- Install required libraries
pip install numpy pandas matplotlib scikit-learn
- Open isolation_tree.ipynb and run all cells sequentially
- Isolation Forest efficiently detects outliers in linear and non-linear data
- Anomalies are isolated early due to random partitioning
- Works well without labeled data
- Scales efficiently to larger datasets
- Tune contamination parameter for better control
- Compare with LOF and DBSCAN
- Apply to real-world anomaly detection datasets
- Visualize anomaly scores

