Skip to content

Implemented a U-Net model for multi-class semantic segmentation on the Oxford-IIIT Pet Dataset, including preprocessing, training, and rigorous evaluation using Mean IoU, Dice, and Pixel Accuracy with qualitative visualizations.

License

Notifications You must be signed in to change notification settings

fns12/oxford-pets-unet-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🐶 Oxford-IIIT Pet Dataset – Semantic Segmentation using U-Net

This project implements a U-Net based deep learning model for pixel-level semantic segmentation on the Oxford-IIIT Pet Dataset. The goal is to segment pets from the background by classifying each pixel into three categories: background, pet, and pet boundary.

The project demonstrates the complete segmentation pipeline including data preprocessing, model training, quantitative evaluation, and qualitative visualization of predictions.


📂 Dataset

The Oxford-IIIT Pet Dataset contains approximately 7,000 images of cats and dogs with corresponding pixel-level segmentation masks. Each mask is annotated into three classes:

  • Background
  • Pet
  • Pet boundary

Images and masks were resized to 128×128 for efficient training and evaluation.


🧠 Model Architecture

A U-Net convolutional neural network was used for segmentation. U-Net follows an encoder–decoder structure with skip connections, allowing it to capture both high-level context and fine-grained spatial details.

This architecture is well suited for medical and object segmentation tasks where precise localization is required.


⚙️ Preprocessing & Training

The following preprocessing and training steps were applied:

  • Images resized to 128×128
  • Pixel values normalized to [0, 1]
  • Segmentation masks converted to zero-based class labels
  • Random horizontal flipping used for data augmentation
  • Training dataset shuffled, cached, and prefetched for efficiency
  • Test dataset batched without augmentation

The model was trained using:

  • Optimizer: Adam
  • Loss: Sparse Categorical Cross-Entropy
  • Epochs: 10

Training and validation curves showed stable convergence with minimal overfitting.


📊 Evaluation Metrics

The trained model was evaluated on the test dataset using standard segmentation metrics.

Metric Value
Mean IoU 0.6800
Mean Dice Coefficient 0.7858
Pixel Accuracy 0.8739

These metrics were computed using a custom evaluation function that calculates class-wise Intersection-over-Union and Dice scores, then averages them across the dataset.


🖼️ Qualitative Results

The model produces visually accurate segmentation masks for pets. Predicted masks align well with ground-truth annotations, with most errors occurring near object boundaries and in images with complex poses or occlusions.

Sample predictions are shown directly in the notebook.


⚡ Inference Performance

  • Inference Speed: ~134 FPS on GPU
  • Model Size: ~355 MB

The model is capable of real-time inference on GPU-based systems, making it suitable for fast segmentation pipelines and academic experimentation.


📌 Conclusion

This project demonstrates that a U-Net based architecture can achieve strong performance on the Oxford-IIIT Pet segmentation task using a relatively compact input resolution. The combination of quantitative metrics (IoU, Dice, Pixel Accuracy) and qualitative visualizations provides a comprehensive evaluation of model performance.


▶️ How to Run

  1. Install dependencies:
pip install tensorflow tensorflow-datasets
  1. Open the notebook:
jupyter notebook pets_unet.ipynb

Run all cells to train the model, evaluate performance, and visualize predictions.

About

Implemented a U-Net model for multi-class semantic segmentation on the Oxford-IIIT Pet Dataset, including preprocessing, training, and rigorous evaluation using Mean IoU, Dice, and Pixel Accuracy with qualitative visualizations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published