Skip to content

Anomaly Detection Modelling - Statistical (IQR) and Machine Learning methods (OCSVM and Isolation Forest) implemented and evaluated to build an anomaly detection model for ship engine data. Completed as part of the Cambridge Data Science Programme.

License

Notifications You must be signed in to change notification settings

sian-davies/anomaly_detection_svm_if

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anomaly Detection for Ship Engine Performance

Detecting Anomalous Activity in Ship Engines Using Machine Learning

Overview

This project develops a systematic approach to anomaly detection for ship engine performance monitoring. In the shipping industry, abnormal behavior in engine parameters can lead to increased fuel consumption, safety risks, and operational downtime. Detecting such anomalies early enables proactive preventative maintenance, reducing both costs and risks.

This analysis uses unsupervised machine learning techniques to identify early signs of potential engine malfunctions, enabling proactive maintenance strategies through intelligent anomaly detection.

Dataset

This analysis uses six critical engine performance metrics from ship operations:

  • Engine RPM: Rotational speed variations indicating potential performance issues.
  • Lubrication Oil Pressure: Abnormal values may indicate lubrication deficiencies or blockages.
  • Fuel Pressure: Variations can imply issues related to combustion efficiency and fuel delivery.
  • Coolant Pressure: Deviations may point to leaks or cooling system faults.
  • Lubrication Oil Temperature: Abnormal temperatures affect the oil’s lubricating efficacy.
  • Coolant Temperature: Elevated or reduced temperatures can indicate cooling system failures.

Data Source: Devabrat, M., 2022. Predictive Maintenance on Ship's Main Engine using AI. https://dx.doi.org/10.21227/g3za-v415.

Dataset Characteristics: Anomalies constitute approximately 1-5% of data points, presenting a realistic class-imbalanced scenario typical of real-world anomaly detection challenges.

Analysis Workflow & Key Findings

Exploratory Data Analysis (EDA)

Assessing data quality, distribution, missing values and anomalies in the input features.

EDA revealed the data distributions were generally non-normal, with many extreme outliers and possible structure within the dataset. Therefore, methods assuming a normal distribution (such as standard deviation or Z-score) were NOT suitable for this analysis, but IQR could be considered as an alternative.

Statistical Methods (IQR)

Implementing statistical methods for anomaly detection using Interquartile Range

Statistical Analysis with IQR gave a reasonable percentage of anomalies within the dataset (2.16%). However, further investigation revealed potential relationships between features impacting anomalous datapoints. This suggested that IQR may not be the most appropriate method, as it is generally not suitable for anomaly detection in non-linear and multivariate data. Therefore, unsupervised machine learning was considered as an alternative for multidimensional relationships.

Machine Learning Model Development

Implementing unsupervised machine learning techniques (SVM, Isolation Forest) to detect anomalous behavior.

  • One-Class SVM: Assessed first as a highly tunable method for detecting outliers. This model was optimised to minimise the chance of missed anomalies (false negatives). However, it is a distance-based method and therefore required scaling of the dataset; if done improperly this can mask outliers by inappropriately compressing the data range.

  • Isolation Forest: A second method that did not require scaling of the data was performed and results compared. Isolation Forest is a tree-based unsupervised ML method effective at identifying anomalies that are only detectable when considered in a multidimensional/multi-feature context.

Model Evaluation & Comparison

Assessing and comparing model performance.

Isolation Forest corroborated some results of One-Class SVM, i.e. potential protective effects of certain features against anomalies, but failed to identify an obvious visual outlier that SVM detected. This suggested that while there are multidimensional effects in this dataset, individual feature effects are still important. One-Class SVM was therefore proposed as the most suitable anomaly detection model for this dataset.

There was a considerable difference in samples identified as anomalous between the two ML methods (anomalies identified in both comprised 53.7% of Isolation Forest anomalies, and 53.8% of SVM anomalies), indicating the methods identified different observations as anomalous.

Key Insights

  • Feature Interactions: Evidence of protective effects when certain parameters remain within optimal ranges
  • Method Complementarity: Different ML approaches identify distinct anomaly patterns
  • Multidimensional Effects: Anomalies are often only detectable when considering multiple features simultaneously
  • Individual Feature Importance: Single-feature effects remain significant despite multidimensional relationships

Future Work

  • Investigate nature and relative importance of shared vs. non-shared anomalies identified across different ML methods
  • Analyse feature relationships and protective effects in optimal parameter ranges
  • Develop ensemble approaches combining multiple detection methods
  • Validate findings with domain expert knowledge and operational data

Project Structure

anomaly_detection_svm_if/
├── data/
│   └── ship_engine_data.csv      # Raw engine performance dataset
├── notebooks/
│   └── anomaly_detection_with_machine_learning.ipynb   # Notebook with full analysis & workflow
└── docs/
    └── anomaly_detection_report.pdf      # Detailed technical report

Getting Started

  1. Clone the repository
  2. Install required dependencies (see notebook for package requirements)
  3. Run the Jupyter notebook for step-by-step analysis
  4. Refer to the technical report for detailed methodology and results

Technologies Used

  • Python 3
  • Scikit-learn (PCA, One-Class SVM, Isolation Forest)
  • Pandas, NumPy, SciPy (Data manipulation)
  • Matplotlib, Seaborn, Networkx (Visualisation)
  • Jupyter Notebook / Google Colab

Acknowledgments

This project was completed as part of the Data Science Career Accelerator at the University of Cambridge (2024).


This project demonstrates practical application of unsupervised machine learning for industrial anomaly detection, with potential applications across maritime and other heavy industry sectors.

About

Anomaly Detection Modelling - Statistical (IQR) and Machine Learning methods (OCSVM and Isolation Forest) implemented and evaluated to build an anomaly detection model for ship engine data. Completed as part of the Cambridge Data Science Programme.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published