Skip to content

Production-grade predictive maintenance system using NASA C-MAPSS dataset. Real-time anomaly detection + RUL prediction with Kafka, LSTM, TimescaleDB, and MLflow.

Notifications You must be signed in to change notification settings

devwithmohit/predictive-maintenance-manufacturing-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Maintenance System

🎯 Project Overview

A production-ready predictive maintenance system for manufacturing equipment that predicts failures before they occur, reducing downtime costs by 35-45% and maintenance costs by 25-30%.

🏗️ System Architecture

[Factory Floor]
   ├─ IoT Sensors (vibration, temp, pressure, power)
   └─ Edge Device (data aggregation, initial filtering)
          ↓
   [Kafka Topic: raw_sensor_data]
          ↓
   [Stream Processor - Apache Flink / Python Consumer]
   ├─ Data validation
   ├─ Feature engineering (rolling stats, FFT)
   └─ Writes to → TimescaleDB (time-series) + S3 (raw backup)
          ↓
   [ML Pipeline]
   ├─ Offline Training (daily/weekly): LSTM + Random Forest
   ├─ Model Registry (MLflow)
   └─ Batch feature computation
          ↓
   [Real-time Inference Service - FastAPI]
   ├─ Loads latest model from MLflow
   ├─ Consumes from Kafka: processed_features
   └─ Predicts: RUL (Remaining Useful Life), Anomaly Score
          ↓
   [Alert System]
   ├─ Threshold-based triggers (RUL < 48hrs)
   └─ Sends → Email / Slack / SMS
          ↓
   [Dashboard - Streamlit/Grafana]
   └─ Real-time monitoring, historical trends, equipment health

🛠️ Tech Stack

Component Technology Justification
Message Queue Apache Kafka Industry standard for IoT streaming; 100k+ events/sec; fault-tolerant
Stream Processing Python (kafka-python) Simpler than Flink for demo; production-ready; easier ML integration
Time-series DB TimescaleDB SQL + time-series optimizations; better than InfluxDB for relational context
Object Storage MinIO (local S3) Production S3 patterns without AWS costs
ML Framework PyTorch (LSTM) + scikit-learn LSTM for sequences; RF for baseline; interview-friendly
Model Registry MLflow Tracks experiments, versions models, serves via REST API
Inference API FastAPI Fast, async, auto-generated docs; production standard
Monitoring Grafana + Streamlit Grafana = ops teams; Streamlit = custom dashboards
Orchestration Airflow Schedules retraining, feature backfills
Containerization Docker + Docker Compose Local dev; mention K8s for scale in README

📦 Modules

Phase 1: Data Foundation ✅

  • Module 1: Data Generator (data_generator/) - ✅ COMPLETED

    • Simulates sensor data with realistic degradation patterns
    • Publishes to Kafka raw_sensor_data topic
    • Supports turbofan engines, pumps, compressors
    • Injects failures: linear, exponential, step, oscillating patterns
  • Module 2: Kafka Infrastructure (infra/kafka/) - ✅ COMPLETED

    • Docker Compose with Kafka, Zookeeper, TimescaleDB, MinIO, Redis
    • 7 Kafka topics with retention/compression policies
    • Consumer group configurations
    • Health check and management scripts
    • TimescaleDB schema with hypertables and continuous aggregates
  • Module 3: Stream Processor (stream_processor/) - ✅ COMPLETED

    • Real-time Kafka consumer with batch processing
    • Feature engineering pipeline (time-domain, frequency-domain)
    • TimescaleDB writer with connection pooling
    • Configurable buffer sizes and write intervals

Phase 2: ML Pipeline ✅

  • Module 4: Feature Store (feature_store/) - ✅ COMPLETED

    • Time-series features (rolling statistics, lag features)
    • Frequency-domain features (FFT, spectral analysis)
    • Label generation for RUL prediction
    • Feature versioning and metadata tracking
  • Module 5: Training Pipeline (ml_pipeline/train/) - ✅ COMPLETED

    • LSTM model for sequence prediction
    • Random Forest baseline model
    • Hyperparameter tuning with Optuna
    • Cross-validation framework
    • MLflow experiment tracking
  • Module 6: Model Evaluation (ml_pipeline/evaluate/) - ✅ COMPLETED

    • Comprehensive metrics (MAE, RMSE, R2, MAPE)
    • Model card generation
    • Evaluation visualizations (8 types)
    • Backtesting framework

Phase 3: Inference & Monitoring ✅

  • Module 7: Inference API (inference_service/) - ✅ COMPLETED

    • FastAPI service with async endpoints
    • Model manager with versioning
    • Real-time predictions (<50ms latency)
    • Batch prediction support
    • Health monitoring endpoints
  • Module 8: Alert Engine (alerting/) - ✅ COMPLETED

    • Rule-based alerting system (11+ built-in rules)
    • Multi-channel notifications (email, Slack, webhook, database)
    • Alert suppression and aggregation
    • Configurable thresholds and priorities
  • Module 9: Dashboard (dashboard/) - ✅ COMPLETED

    • Streamlit interactive dashboard
    • Grafana monitoring dashboards
    • Real-time equipment health visualization
    • Historical trend analysis
    • Alert management UI

Phase 4: MLOps & Automation ✅

  • Module 10: Retraining Pipeline (ml_pipeline/retrain/) - ✅ COMPLETED
    • Automated drift detection (data & concept drift)
    • Model comparison framework
    • Safe deployment with rollback
    • Scheduled retraining workflow

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • Docker & Docker Compose
  • 8GB RAM minimum (16GB recommended)

Installation

  1. Clone the repository:
cd predictive-maintenance
  1. Start infrastructure services:
cd infra/kafka
./scripts/start-infra.sh  # Linux/Mac
# or
.\scripts\start-infra.bat  # Windows

# This starts: Kafka, Zookeeper, TimescaleDB, MinIO, Redis
  1. Run Data Streamer (C-MAPSS Real Data):
cd data_loader
pip install -r requirements.txt

# Stream NASA C-MAPSS turbofan engine data
python kafka_streamer.py --dataset FD001 --train --rate 1.0

# Or stream single engine for testing
python kafka_streamer.py --dataset FD001 --train --engine 1 --rate 10.0
  1. Or Run Synthetic Data Generator:
cd data_generator
pip install -r requirements.txt
python main.py --num-equipment 10 --equipment-type turbofan_engine
  1. Run Stream Processor:
cd stream_processor
pip install -r requirements.txt
python main.py
  1. Train Models (on C-MAPSS data):
cd ml_pipeline/train
pip install -r requirements.txt

# Train on NASA C-MAPSS dataset
python train_pipeline.py --config config/training_config.yaml --dataset cmapss

# Evaluate with NASA PHM08 scoring function
cd ../evaluate
python evaluator.py --model-path ../../models/latest_model.pkl --dataset cmapss
  1. Start Inference API:
cd inference_service
pip install -r requirements.txt
uvicorn api.main:app --host 0.0.0.0 --port 8000
  1. Start Dashboard:
cd dashboard/streamlit_app
pip install -r requirements.txt
streamlit run app.py

Quick Test (Mock Mode - No Dependencies)

Run without Kafka/TimescaleDB:

cd data_generator
python main.py --mock --num-equipment 5 --equipment-type turbofan_engine

Access Points

📊 Data

Real Dataset: NASA C-MAPSS

The project now supports the NASA C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) Turbofan Engine Degradation Dataset:

  • 100 training engines (run-to-failure trajectories)
  • 100 test engines (with true RUL labels)
  • 21 sensor measurements per cycle
  • 3 operational settings (altitude, throttle, Mach number)

Dataset Location: archive/CMaps/ (included in repository)

Files:

  • train_FD001.txt - Training data (complete degradation)
  • test_FD001.txt - Test data (stops before failure)
  • RUL_FD001.txt - True remaining useful life for test engines

Usage:

from data_loader import CMAPSSLoader

loader = CMAPSSLoader(dataset_path="archive/CMaps", dataset_id="FD001")
train_df = loader.load_train_data()  # With RUL labels
test_df = loader.load_test_data()    # With true RUL

See data_loader/README.md for complete documentation.

Synthetic Data Generator (Optional)

The system also includes a synthetic data generator for additional testing:

  • Configurable equipment types: Turbofan engines, pumps, compressors
  • Failure patterns: Linear, exponential, step, oscillating degradation
  • Realistic sensor noise and operating conditions

Datasets Used

  • NASA C-MAPSS: Turbofan engine degradation dataset (included in /archive/CMaps/)
    • FD001: 100 train + 100 test engines, 1 operating condition, 1 fault mode
    • 21 sensor measurements per operational cycle
    • Run-to-failure trajectories with true RUL labels
  • Synthetic Data: Custom generated sensor data with realistic failure patterns

Sensor Types

  • Temperature: Inlet, outlet, various stages
  • Pressure: Fan inlet, HPC outlet, bypass duct
  • Vibration: Bearing, shaft imbalance
  • Flow: Air flow, fuel flow
  • Power: Motor current, consumption

🎓 Key Features for Resume

Technical Depth

Real-time streaming with Kafka (100k+ events/sec) ✅ Time-series forecasting with LSTM networks ✅ Anomaly detection using Isolation Forest & Autoencoders ✅ Survival analysis for RUL prediction ✅ Feature engineering (FFT, rolling statistics, frequency domain) ✅ Model versioning with MLflow ✅ REST API with FastAPI for real-time inference ✅ Production monitoring with Grafana dashboards

Business Impact

  • Cost Reduction: 25-30% maintenance cost savings
  • Downtime Prevention: 35-45% reduction in unplanned downtime
  • Equipment Lifespan: 20% extension through optimized maintenance
  • Annual Savings: $50B industry problem addressed

📈 Model Performance Metrics

Anomaly Detection

  • Precision: Target >90% (minimize false alarms)
  • Recall: Target >85% (catch real failures)
  • F1-Score: Balanced metric

RUL Prediction

  • MAE: Mean Absolute Error (cycles)
  • RMSE: Root Mean Square Error
  • Early/Late Predictions: Tracking lead time accuracy

🔧 Development Roadmap

  • Phase 1: Data Foundation
    • Data Generator with failure injection
    • Kafka infrastructure setup (Kafka, TimescaleDB, MinIO, Redis)
    • Stream processor with feature engineering
  • Phase 2: ML Pipeline
    • Feature store implementation
    • LSTM + Random Forest training pipeline
    • Model evaluation & backtesting framework
  • Phase 3: Inference & Monitoring
    • FastAPI inference service
    • Alert engine with multi-channel notifications
    • Streamlit + Grafana dashboards
  • Phase 4: MLOps & Automation
    • Automated retraining with drift detection
    • Model comparison and safe deployment
    • Rollback capabilities

All 10 modules completed! 🎉

📚 Documentation

Module Documentation

Infrastructure

Getting Started Guides

  • Architecture Overview (this file)
  • API Documentation: http://localhost:8000/docs (when running)
  • Deployment Guide: See individual module READMEs

🤝 Contributing

This is a portfolio project demonstrating production ML systems. Each module is designed to be:

  • Interview-friendly: Clear code, well-documented
  • Production-ready: Error handling, logging, monitoring
  • Scalable: Designed for distributed deployment

📝 License

Educational/Portfolio Project - 2026

🔗 References

  1. Saxena, A., et al. "Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation" (PHM08)
  2. Kafka Streams Documentation
  3. LSTM for Time Series Forecasting
  4. Survival Analysis in Predictive Maintenance

Built for demonstrating end-to-end ML systems engineering skills

About

Production-grade predictive maintenance system using NASA C-MAPSS dataset. Real-time anomaly detection + RUL prediction with Kafka, LSTM, TimescaleDB, and MLflow.

Topics

Resources

Stars

Watchers

Forks