Skip to content

KaggleChallenge-Group9/Spaceship-Titanic-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Spaceship Titanic Challenge – Kaggle Group Assignment

Team Name: Cosmic Learners (Group 9)
Challenge: Spaceship Titanic - Kaggle Competition
Course Assignment: Data Science Assignment 3 – Group Kaggle Challenge
Google Colab: Google Colab Link GitHub Repo: Github Link


👨‍🚀 Project Summary

Welcome to our repository for the Spaceship Titanic Challenge, a machine learning competition hosted on Kaggle. Our objective was to develop predictive models that determine whether a passenger was transported to an alternate dimension during a futuristic interstellar voyage.

Our group, Cosmic Learners, collaboratively built a complete and reproducible machine learning pipeline, including exploratory analysis, feature engineering, model training, hyperparameter tuning, and performance evaluation.


👥 Team Members

  • Ayisha Fidha Maniyodan
  • Diya Amith Kodappully
  • Dona Uresha Pamodi Dasanayake
  • Fawas Afsal
  • Mohammed Nihad Kaipalli
  • Sam Jacob
  • Sandra Binu
  • Sharon Zacharia

🧠 Project Workflow

This notebook showcases a comprehensive machine learning pipeline divided into the following structured sections:

1. 🧭 Exploratory Data Analysis (EDA)

  • Univariate & bivariate feature exploration
  • Correlation analysis
  • Visualizations using seaborn, matplotlib, squarify, and pie charts
  • Identification of missing values and patterns

2. 🧹 Data Preprocessing

  • Missing value imputation (mean/mode strategy)
  • Feature extraction from Cabin, Name, and other composite fields
  • Label encoding for categorical variables
  • Standardization of numerical features

3. 🧪 Model Training & Evaluation

  • Logistic Regression
  • Random Forest
  • XGBoost
  • Deep Neural Network (Keras Sequential Model)
  • Evaluation metrics: Accuracy, Precision, Recall, F1 Score, AUC-ROC
  • ROC Curves & Model Comparison Charts

4. 🔍 Hyperparameter Tuning

  • GridSearchCV for Logistic Regression, Random Forest, and XGBoost
  • EarlyStopping & ReduceLROnPlateau for DNNs
  • PCA used optionally for dimensionality reduction & visualization

5. 📈 Explainability

  • Model explainability with SHAP (SHapley Additive exPlanations)
  • Visual insights into feature contributions for top predictions

6. 📊 Model Performance Summary

Model Accuracy Precision Recall F1-score
Logistic Regression 0.769 0.750 0.815 0.781
Random Forest 0.795 0.805 0.784 0.794
XGBoost 0.800 0.794 0.814 0.804
Neural Network 0.802 0.791 0.827 0.808

🏆 Best Model: The Neural Network achieved the highest accuracy of 80.2%, along with the strongest F1-score (0.808), making it the most effective model for this classification task in our pipeline.

6. 📤 Submission Generation

  • Final prediction pipeline applied to the test set
  • Submission file generated in proper Kaggle format

🧑‍💻 Collaboration & Tools

  • Version Control: GitHub (branching, pull requests, versioning)
  • Shared Environment: Google Colab for code collaboration
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, TensorFlow/Keras, SHAP

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8