Anomaly Detection for Ship Engine Performance

Detecting Anomalous Activity in Ship Engines Using Machine Learning

Overview

This project develops a systematic approach to anomaly detection for ship engine performance monitoring. In the shipping industry, abnormal behavior in engine parameters can lead to increased fuel consumption, safety risks, and operational downtime. Detecting such anomalies early enables proactive preventative maintenance, reducing both costs and risks.

This analysis uses unsupervised machine learning techniques to identify early signs of potential engine malfunctions, enabling proactive maintenance strategies through intelligent anomaly detection.

Dataset

This analysis uses six critical engine performance metrics from ship operations:

Engine RPM: Rotational speed variations indicating potential performance issues.
Lubrication Oil Pressure: Abnormal values may indicate lubrication deficiencies or blockages.
Fuel Pressure: Variations can imply issues related to combustion efficiency and fuel delivery.
Coolant Pressure: Deviations may point to leaks or cooling system faults.
Lubrication Oil Temperature: Abnormal temperatures affect the oil’s lubricating efficacy.
Coolant Temperature: Elevated or reduced temperatures can indicate cooling system failures.

Data Source: Devabrat, M., 2022. Predictive Maintenance on Ship's Main Engine using AI. https://dx.doi.org/10.21227/g3za-v415.

Dataset Characteristics: Anomalies constitute approximately 1-5% of data points, presenting a realistic class-imbalanced scenario typical of real-world anomaly detection challenges.

Analysis Workflow & Key Findings

Exploratory Data Analysis (EDA)

Assessing data quality, distribution, missing values and anomalies in the input features.

EDA revealed the data distributions were generally non-normal, with many extreme outliers and possible structure within the dataset. Therefore, methods assuming a normal distribution (such as standard deviation or Z-score) were NOT suitable for this analysis, but IQR could be considered as an alternative.

Statistical Methods (IQR)

Implementing statistical methods for anomaly detection using Interquartile Range

Statistical Analysis with IQR gave a reasonable percentage of anomalies within the dataset (2.16%). However, further investigation revealed potential relationships between features impacting anomalous datapoints. This suggested that IQR may not be the most appropriate method, as it is generally not suitable for anomaly detection in non-linear and multivariate data. Therefore, unsupervised machine learning was considered as an alternative for multidimensional relationships.

Machine Learning Model Development

Implementing unsupervised machine learning techniques (SVM, Isolation Forest) to detect anomalous behavior.

One-Class SVM: Assessed first as a highly tunable method for detecting outliers. This model was optimised to minimise the chance of missed anomalies (false negatives). However, it is a distance-based method and therefore required scaling of the dataset; if done improperly this can mask outliers by inappropriately compressing the data range.
Isolation Forest: A second method that did not require scaling of the data was performed and results compared. Isolation Forest is a tree-based unsupervised ML method effective at identifying anomalies that are only detectable when considered in a multidimensional/multi-feature context.

Model Evaluation & Comparison

Assessing and comparing model performance.

Isolation Forest corroborated some results of One-Class SVM, i.e. potential protective effects of certain features against anomalies, but failed to identify an obvious visual outlier that SVM detected. This suggested that while there are multidimensional effects in this dataset, individual feature effects are still important. One-Class SVM was therefore proposed as the most suitable anomaly detection model for this dataset.

There was a considerable difference in samples identified as anomalous between the two ML methods (anomalies identified in both comprised 53.7% of Isolation Forest anomalies, and 53.8% of SVM anomalies), indicating the methods identified different observations as anomalous.

Key Insights

Feature Interactions: Evidence of protective effects when certain parameters remain within optimal ranges
Method Complementarity: Different ML approaches identify distinct anomaly patterns
Multidimensional Effects: Anomalies are often only detectable when considering multiple features simultaneously
Individual Feature Importance: Single-feature effects remain significant despite multidimensional relationships

Future Work

Investigate nature and relative importance of shared vs. non-shared anomalies identified across different ML methods
Analyse feature relationships and protective effects in optimal parameter ranges
Develop ensemble approaches combining multiple detection methods
Validate findings with domain expert knowledge and operational data

Project Structure

anomaly_detection_svm_if/
├── data/
│   └── ship_engine_data.csv      # Raw engine performance dataset
├── notebooks/
│   └── anomaly_detection_with_machine_learning.ipynb   # Notebook with full analysis & workflow
└── docs/
    └── anomaly_detection_report.pdf      # Detailed technical report

Getting Started

Clone the repository
Install required dependencies (see notebook for package requirements)
Run the Jupyter notebook for step-by-step analysis
Refer to the technical report for detailed methodology and results

Technologies Used

Python 3
Scikit-learn (PCA, One-Class SVM, Isolation Forest)
Pandas, NumPy, SciPy (Data manipulation)
Matplotlib, Seaborn, Networkx (Visualisation)
Jupyter Notebook / Google Colab

Acknowledgments

This project was completed as part of the Data Science Career Accelerator at the University of Cambridge (2024).

This project demonstrates practical application of unsupervised machine learning for industrial anomaly detection, with potential applications across maritime and other heavy industry sectors.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
docs		docs
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection for Ship Engine Performance

Overview

Dataset

Analysis Workflow & Key Findings

Exploratory Data Analysis (EDA)

Statistical Methods (IQR)

Machine Learning Model Development

Model Evaluation & Comparison

Key Insights

Future Work

Project Structure

Getting Started

Technologies Used

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

sian-davies/anomaly_detection_svm_if

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection for Ship Engine Performance

Overview

Dataset

Analysis Workflow & Key Findings

Exploratory Data Analysis (EDA)

Statistical Methods (IQR)

Machine Learning Model Development

Model Evaluation & Comparison

Key Insights

Future Work

Project Structure

Getting Started

Technologies Used

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages