Skip to content

An AI-powered bridge health classification system that automatically categorizes bridge inspection reports into health levels using machine learning. The system leverages Explainable Boosting Machine (EBM) to achieve high accuracy while maintaining interpretability.

License

Notifications You must be signed in to change notification settings

tk-yasuno/health-ebm-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ—๏ธ Health EBM Classification

Bridge Health Level Classification using Explainable Boosting Machine

Python License Status

๐ŸŽฏ Overview

An AI-powered bridge health classification system that automatically categorizes bridge inspection reports into health levels using machine learning. The system leverages Explainable Boosting Machine (EBM) to achieve high accuracy while maintaining interpretability.

๐Ÿ† Key Achievements (v0.3)

  • ๐Ÿš€ EBM 25ๅ€้ซ˜้€ŸๅŒ–: 25ๅˆ† โ†’ 63.8็ง’๏ผˆ16ไธฆๅˆ—ๅ‡ฆ็†๏ผ‰
  • 91.88% Test Accuracy - ๅฎŸ็”จใƒฌใƒ™ใƒซ้”ๆˆ๏ผ
  • 86.20% F1-macro score (+19.8pt improvement from v0.2)
  • ใƒ•ใƒซใƒ‡ใƒผใ‚ฟๅญฆ็ฟ’: 8,615ไปถๅ‡ฆ็†๏ผˆ31ๅ€ใƒ‡ใƒผใ‚ฟๆดป็”จ๏ผ‰
  • Repair-requirement F1: 80% - ๅฎŸ็”จๆ€ง็ขบไฟ

๐Ÿ“Š Version History

Version Data Size Test Accuracy F1-Macro Key Innovation
v0.3 8,615ไปถ 91.88% 86.20% 16ไธฆๅˆ—้ซ˜้€ŸๅŒ– + ใƒ•ใƒซใƒ‡ใƒผใ‚ฟ
v0.2 276ไปถ 84.34% 66.40% ้›†็ด„ใƒ‡ใƒผใ‚ฟใงใฎMVP

๐Ÿš€ Quick Start

# Clone the repository
git clone https://github.com/YOUR_USERNAME/health-ebm-classification.git
cd health-ebm-classification

# Set up virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1  # Windows
# source .venv/bin/activate    # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the complete pipeline
cd src
python main_pipeline.py

๐Ÿ“Š Performance Results (v0.3)

Model Training Time Val F1-Macro Test Accuracy ็‰นๅพด
๐Ÿฅ‡ EBM 63.80็ง’ 85.34% 91.88% ๆœ€้ซ˜็ฒพๅบฆ+้ซ˜้€ŸๅŒ–
๐Ÿฅˆ XGBoost Enhanced 5.42็ง’ 82.91% 89.12% ้ซ˜้€Ÿ้ซ˜็ฒพๅบฆ
๐Ÿฅ‰ CatBoost 28.12็ง’ 79.44% 87.56% ใƒใƒฉใƒณใ‚นๅž‹
LightGBM 2.23็ง’ 76.38% 85.23% ่ถ…้ซ˜้€Ÿ
Random Forest 0.42็ง’ 71.64% 82.34% ๆœ€้ซ˜้€Ÿ

๐Ÿ”ฅ v0.3 ้ฉๅ‘ฝ็š„ๆ”นๅ–„

  • EBM้ซ˜้€ŸๅŒ–: 25ๅˆ† โ†’ 63.8็ง’๏ผˆ25ๅ€้ซ˜้€ŸๅŒ–๏ผ‰
  • ็ฒพๅบฆๅ‘ไธŠ: Test Accuracy 84.34% โ†’ 91.88% (+7.54pt)
  • F1ๅ‘ไธŠ: 66.40% โ†’ 86.20% (+19.80pt)
  • ๅฎŸ็”จๆ€ง: Repair-requirement F1 25% โ†’ 80% (+55pt)

๐Ÿ—๏ธ Architecture

Data Pipeline (v0.3)

Raw CSV Data โ†’ Preprocessing โ†’ Feature Engineering โ†’ Model Training โ†’ Evaluation
     โ†“              โ†“               โ†“                  โ†“              โ†“
  9,753 records  โ†’ 8,615 samples โ†’ 1,019 features  โ†’ 7 models    โ†’ Best: EBM (91.88%)
                   (31ๅ€ใƒ‡ใƒผใ‚ฟ)     (ใƒ•ใƒซๆดป็”จ)        (16ไธฆๅˆ—)      (ๅฎŸ็”จใƒฌใƒ™ใƒซ)

Classification System (v0.3)

  • Level โ…  (Healthy): 1,404 samples (16.3%)
  • Level โ…ก (Preventive): 6,332 samples (73.5%)
  • Repair-required (III+): 879 samples (10.2%) - 44ๅ€ๅข—ๅŠ ๏ผ

๐Ÿ”ง Technical Features

๐Ÿš€ v0.3 ๆ–ฐๆฉŸ่ƒฝใƒป้ซ˜้€ŸๅŒ–

  • โšก 16ไธฆๅˆ—ๅ‡ฆ็†: EBMใ‚’25ๅ€้ซ˜้€ŸๅŒ–๏ผˆ25ๅˆ†โ†’63.8็ง’๏ผ‰
  • ๐Ÿ“Š ใƒ•ใƒซใƒ‡ใƒผใ‚ฟๅญฆ็ฟ’: 8,615ไปถใฎๅ€‹ๅˆฅ่จ˜้Œฒๅ‡ฆ็†
  • ๐ŸŽฏ ๅฎŸ่กŒๆ™‚้–“ใƒˆใƒฉใƒƒใ‚ญใƒณใ‚ฐ: ๅ…จใƒขใƒ‡ใƒซใฎๆ€ง่ƒฝ็›ฃ่ฆ–
  • ๐Ÿ”ง ๆœ€้ฉๅŒ–ใƒ‘ใƒฉใƒกใƒผใ‚ฟ: interactions=10, max_bins=64

๐Ÿง  Machine Learning

  • 7 Advanced Models: EBM, LightGBM, CatBoost, XGBoost, Random Forest
  • Class Imbalance Handling: Strategic class consolidation and weighting
  • Cross-validation: 5-fold CV for robust evaluation
  • Interpretable AI: EBM provides feature importance and decision explanations

๐Ÿ“ Text Processing

  • Japanese NLP: Janome morphological analysis
  • TF-IDF Vectorization: 1,000-dimensional text features
  • Domain Keywords: Bridge-specific terminology extraction
  • Multi-modal Features: Text + numerical + categorical data

๐Ÿ› ๏ธ Engineering

  • Automated Pipeline: End-to-end ML workflow
  • Error Handling: Robust processing with fallback strategies
  • Modular Design: Easily extensible components
  • Performance Monitoring: Detailed metrics and reporting

๐Ÿ“ Project Structure

health-ebm-classification/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main_pipeline.py          # Main execution pipeline (v0.3ๅฏพๅฟœ)
โ”‚   โ”œโ”€โ”€ data_loader.py             # Data loading + ใƒ•ใƒซใƒ‡ใƒผใ‚ฟใƒขใƒผใƒ‰
โ”‚   โ”œโ”€โ”€ feature_engineering.py    # Feature extraction and engineering
โ”‚   โ””โ”€โ”€ model_trainer.py           # Model training + 16ไธฆๅˆ—ๅ‡ฆ็†
โ”œโ”€โ”€ 1_inspection-dataset/          # Bridge inspection data
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ README_v0-3.md            # ๐Ÿ†• v0.3 ๆˆๆžœใพใจใ‚
โ”‚   โ”œโ”€โ”€ README_v0-2.md            # v0.2 technical documentation  
โ”‚   โ””โ”€โ”€ QUICK_GUIDE.md            # 5-minute start guide
โ”œโ”€โ”€ requirements.txt               # Python dependencies
โ”œโ”€โ”€ .gitignore                    # Git ignore rules
โ””โ”€โ”€ README.md                     # This file

๐Ÿ“ˆ Use Cases

๐Ÿข Infrastructure Management

  • Automated Inspection: Reduce manual assessment time
  • Risk Prioritization: Identify bridges requiring immediate attention
  • Maintenance Planning: Data-driven repair scheduling
  • Quality Assurance: Consistent evaluation standards

๐Ÿ” Decision Support

  • Explainable Predictions: Understand why a bridge needs repair
  • Confidence Scoring: Reliability indicators for each prediction
  • Comparative Analysis: Benchmark against historical data
  • Expert Validation: AI recommendations with human oversight

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.11+
  • 8GB+ RAM (for ใƒ•ใƒซใƒ‡ใƒผใ‚ฟๅ‡ฆ็†)
  • ใƒžใƒซใƒใ‚ณใ‚ขCPUๆŽจๅฅจ (16ไธฆๅˆ—ๅ‡ฆ็†ๅฏพๅฟœ)
  • Bridge inspection CSV data

Installation

pip install pandas numpy scikit-learn
pip install lightgbm xgboost catboost
pip install interpret janome  # For EBM and Japanese text

Data Format

Your CSV files should contain:

  • BridgeID: Unique bridge identifier
  • HealthLevel: Current health assessment (โ… , โ…ก, โ…ข, โ…ฃ, โ…ค)
  • Diagnosis: Inspection text description
  • DamageComment: Detailed damage observations

๐Ÿ”ฎ Roadmap

โœ… v0.3 (ๅฎŒไบ†) - EBM้ซ˜้€ŸๅŒ– & ใƒ•ใƒซใƒ‡ใƒผใ‚ฟๅญฆ็ฟ’

  • 25ๅ€้ซ˜้€ŸๅŒ–: EBMๅญฆ็ฟ’ๆ™‚้–“ 25ๅˆ†โ†’63.8็ง’
  • 16ไธฆๅˆ—ๅ‡ฆ็†: CPUๆœ€้ฉๆดป็”จๅฎŸ่ฃ…
  • ใƒ•ใƒซใƒ‡ใƒผใ‚ฟๅญฆ็ฟ’: 8,615ไปถๅ‡ฆ็†ๅฏพๅฟœ
  • ๅฎŸ็”จใƒฌใƒ™ใƒซ้”ๆˆ: Test Accuracy 91.88%
  • ๅŒ…ๆ‹ฌ็š„ใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆ: README_v0-3.mdไฝœๆˆ

v1.0 (Production Ready) - ๆฌกๆœŸใƒชใƒชใƒผใ‚น

  • REST API implementation
  • Real-time prediction endpoint
  • Enhanced interpretability dashboard
  • Web application interface

v1.0 (Production Ready)

  • Web application interface
  • Automated model retraining
  • Integration with inspection databases
  • Multi-language support

v2.0 (Advanced Features)

  • Image analysis integration
  • Geographic factor modeling
  • Predictive maintenance forecasting
  • Mobile app development

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone with development dependencies
git clone https://github.com/YOUR_USERNAME/health-ebm-classification.git
cd health-ebm-classification

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Run linting
flake8 src/
black src/

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“ž Contact & Support

  • Issues: GitHub Issues
  • Documentation: See docs/ folder for detailed guides
  • Questions: Create a discussion in the repository

๐Ÿ… Acknowledgments

  • Microsoft InterpretML: For the Explainable Boosting Machine implementation
  • Japanese NLP Community: For Janome morphological analyzer
  • Bridge Engineering Domain Experts: For validation and insights

Built with โค๏ธ for safer infrastructure

Last Updated: October 3, 2025 - v0.3 Release

About

An AI-powered bridge health classification system that automatically categorizes bridge inspection reports into health levels using machine learning. The system leverages Explainable Boosting Machine (EBM) to achieve high accuracy while maintaining interpretability.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published