Team Name: Cosmic Learners (Group 9)
Challenge: Spaceship Titanic - Kaggle Competition
Course Assignment: Data Science Assignment 3 – Group Kaggle Challenge
Google Colab: Google Colab Link
GitHub Repo: Github Link
Welcome to our repository for the Spaceship Titanic Challenge, a machine learning competition hosted on Kaggle. Our objective was to develop predictive models that determine whether a passenger was transported to an alternate dimension during a futuristic interstellar voyage.
Our group, Cosmic Learners, collaboratively built a complete and reproducible machine learning pipeline, including exploratory analysis, feature engineering, model training, hyperparameter tuning, and performance evaluation.
- Ayisha Fidha Maniyodan
- Diya Amith Kodappully
- Dona Uresha Pamodi Dasanayake
- Fawas Afsal
- Mohammed Nihad Kaipalli
- Sam Jacob
- Sandra Binu
- Sharon Zacharia
This notebook showcases a comprehensive machine learning pipeline divided into the following structured sections:
- Univariate & bivariate feature exploration
- Correlation analysis
- Visualizations using seaborn, matplotlib, squarify, and pie charts
- Identification of missing values and patterns
- Missing value imputation (mean/mode strategy)
- Feature extraction from
Cabin,Name, and other composite fields - Label encoding for categorical variables
- Standardization of numerical features
- Logistic Regression
- Random Forest
- XGBoost
- Deep Neural Network (Keras Sequential Model)
- Evaluation metrics: Accuracy, Precision, Recall, F1 Score, AUC-ROC
- ROC Curves & Model Comparison Charts
GridSearchCVfor Logistic Regression, Random Forest, and XGBoost- EarlyStopping & ReduceLROnPlateau for DNNs
- PCA used optionally for dimensionality reduction & visualization
- Model explainability with SHAP (SHapley Additive exPlanations)
- Visual insights into feature contributions for top predictions
| Model | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Logistic Regression | 0.769 | 0.750 | 0.815 | 0.781 |
| Random Forest | 0.795 | 0.805 | 0.784 | 0.794 |
| XGBoost | 0.800 | 0.794 | 0.814 | 0.804 |
| Neural Network | 0.802 | 0.791 | 0.827 | 0.808 |
🏆 Best Model: The Neural Network achieved the highest accuracy of 80.2%, along with the strongest F1-score (0.808), making it the most effective model for this classification task in our pipeline.
- Final prediction pipeline applied to the test set
- Submission file generated in proper Kaggle format
- Version Control: GitHub (branching, pull requests, versioning)
- Shared Environment: Google Colab for code collaboration
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, TensorFlow/Keras, SHAP