Skip to content

Predicts the fare amount of Uber rides based on various factors such as pickup/drop-off coordinates, passenger count, and trip distance.

Notifications You must be signed in to change notification settings

mdaltamashalam/Uber-Fare-Prediction-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Project Overview : Uber Fare-Prediction-Models

📖 About the Project This project focuses on building a regression model that predicts the fare amount of Uber rides based on various factors such as pickup/drop-off coordinates, passenger count, and trip distance. The dataset is derived from NYC Uber trips and aims to demonstrate practical applications of data cleaning, feature engineering, and model evaluation.

✨ Features

  • Data cleaning and preprocessing of real-world trip records
  • Feature extraction from timestamps and geolocation data
  • Visualization of data distributions and correlations
  • Distance calculation using the Haversine formula
  • Model training using Linear Regression and Random Forest Regressor
  • Performance comparison using RMSE and R² metrics

🧰 Tech Stack

  • Language: Python
  • Environment: Jupyter Notebook
  • Libraries Used:
  • pandas for data manipulation
  • numpy for numerical operations
  • matplotlib and seaborn for data visualization
  • scikit-learn for machine learning models and metrics
  • math for Haversine distance calculation

📊 Data Processing

Dataset: Uber NYC fare data

Cleaning Tasks:

  • Removed missing values
  • Dropped rows with negative or zero distances/fare amounts
  • Filtered unrealistic coordinates

Feature Engineering:

  • Extracted hour, weekday, and month from pickup datetime
  • Calculated distance between pickup and drop-off using the Haversine formula

🧠 Model Training

Multiple regression models were trained and evaluated to predict Uber fare amounts:

  • Linear Regression: Used as a baseline model to establish a point of comparison. It used all numeric and engineered features but was limited in handling complex, non-linear relationships.
  • Random Forest Regressor: An ensemble-based model that improved prediction accuracy by capturing feature interactions and reducing overfitting through averaging.
  • XGBoost: A gradient boosting model known for its speed and performance, especially on structured/tabular data.
  • LightGBM: A high-performance boosting framework that is faster and more efficient with large datasets. It delivered the best overall results in this project.
  • CatBoost: A gradient boosting model optimized for categorical features. It performed competitively and required minimal preprocessing.

Each model was evaluated using: Root Mean Square Error (RMSE): To measure prediction error. R² Score: To quantify the proportion of variance explained by the model.

Performance was evaluated using:

  • RMSE (Root Mean Square Error)
  • R² Score (Coefficient of Determination)

Results

📊 Model Performance Comparison (Phase-1)

Model RMSE R² Score
Random Forest 3.24 0.65
XGBoost 3.07 0.69
LightGBM 2.99 0.70

Phase1

📊 Final Model Performance Comparison (Phase-2)

Model RMSE R² Score
Linear Regression 5.563649 -0.026717
XGBoost 2.777773 0.744068
LightGBM 2.992365 0.702997

Phase2

📊 Final Model Performance Comparison (Phase-3)

Model Metric Value
XGBoost RMSE 3.1918
R² Score 0.7744
LGBM RMSE 3.1142
R² Score 0.7852

Phase 3

📊 Final Model Performance Comparison (Phase-4)

Model Metric Value
LightGBM RMSE 2.8719
R² Score 0.8173
Final Model RMSE 2.8007
R² Score 0.8263

Phase4


✅ Accuracy Interpretation (from R² Score)

  • R² Score close to 1: Model makes accurate predictions.
  • R² Score close to 0 or negative: Poor predictive performance.

🧠 Logic Summary

The best model is the one that:

  • Minimizes RMSE
  • Shows consistent and stable predictions
  • Gives predicted fares close to actual fares

🧭 Usage

  1. Clone the repository: git clone https://github.com/your-username/your-repo-name.git && cd your-repo-name
  2. Install dependencies: pip install -r requirements.txt
  3. Train the model: python train_model.py
  4. Make predictions: python predict.py --input data/sample_input.csv
  5. Evaluate the model: python evaluate.py

Ensure required datasets are placed in the data/ folder before execution.

👥 Contributors

  • Md Altamash Alam
  • Amreen Perween

📄 License

This project is protected under copyright © Md Altamash Alam, 2025.

All rights reserved. Unauthorized copying, distribution, modification, or use of any part of this project without explicit permission is strictly prohibited.

If you wish to use or reference any part of this project for academic, personal, or commercial purposes, please contact the author for permission.


© Md Altamash Alam, 2025 – All Rights Reserved.

About

Predicts the fare amount of Uber rides based on various factors such as pickup/drop-off coordinates, passenger count, and trip distance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published