Skip to content

MohammedMoseena/doordash-delivery-time-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ DoorDash Delivery Time Prediction

Production-ready ML model achieving 11.04-minute MAE with 57.39% accuracy in predicting delivery times

An end-to-end machine learning solution that predicts DoorDash delivery duration from order creation to customer delivery, optimized for real-time customer-facing applications.


πŸ“Š Key Results

Metric Value Status
Mean Absolute Error 11.04 minutes (662.43 sec) βœ…
RΒ² Score 0.307 βœ…
Accuracy (Β±10 min) 57.39% βœ…
Extreme Error Rate 4.71% βœ… (<5% guardrail)
Statistical Validation p-value < 0.001 (McNemar's test) βœ…

🎯 Problem Statement

DoorDash needs accurate real-time predictions of total delivery duration (seconds between order creation and delivery) to:

  • Provide reliable ETAs to customers
  • Optimize dasher allocation
  • Enable proactive operations management
  • Reduce customer complaints and churn

Challenge: Complex marketplace dynamics with non-linear relationships between features including order details, restaurant characteristics, and real-time marketplace load.


πŸ› οΈ Technical Approach

Architecture Overview

Data Pipeline β†’ Feature Engineering β†’ Model Training β†’ Validation β†’ Deployment Ready
     ↓              ↓                      ↓              ↓            ↓
 197K records   Temporal features     XGBoost        SHAP +       API-ready
 99.3% clean    Cyclic encoding       Tuned          McNemar      <200ms latency

Key Features

πŸ“ˆ Simple & Effective Feature Engineering

  • Cyclic encoding for temporal features (hour, day of week)
  • No complex transformations (no log, no scaling - tree models don't need them)
  • Label encoding for categorical features
  • Focus on interpretability and production simplicity

🧠 Iterative Model Development

  • 5 model iterations: Drop Missing β†’ Simple Imputation β†’ XGBoost Native β†’ Temporal Features β†’ Hyperparameter Tuning
  • XGBoost selected for native missing value handling and non-linear pattern capture
  • RandomizedSearchCV with TimeSeriesSplit for robust hyperparameter tuning

βœ… Rigorous Validation

  • Temporal train-test split (80/20) to prevent data leakage
  • Business-aligned metrics (MAE in minutes, accuracy rate, extreme error rate)
  • SHAP analysis for model explainability
  • McNemar's test for statistical significance validation

πŸ“ Repository Structure

doordash-delivery-time-prediction/
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── doordash_delivery_prediction.ipynb   # Complete analysis & modeling
β”‚
β”œβ”€β”€ data/
β”‚   └── historical_data.csv                   # Training dataset (197K records)
β”‚
β”œβ”€β”€ models/
β”‚   └── doordash_eta_xgb_artifacts.joblib    # Saved model + encoders + feature schema
β”‚
β”œβ”€β”€ README.md                                  # This file
β”œβ”€β”€ requirements.txt                           # Python dependencies

πŸš€ Quick Start

Prerequisites

Python 3.8+
pip or conda

Installation

  1. Clone the repository
git clone https://github.com/MohammedMoseena/doordash-delivery-time-prediction.git
cd doordash-delivery-time-prediction
  1. Install dependencies
pip install -r requirements.txt
  1. Run the notebook
jupyter notebook notebooks/doordash_delivery_prediction.ipynb

Using the Trained Model

import joblib
import pandas as pd
import numpy as np

# Load saved artifacts
artifacts = joblib.load("models/doordash_eta_xgb_artifacts.joblib")
model = artifacts["model"]
le_store_cat = artifacts["store_category_encoder"]
feature_columns = artifacts["feature_columns"]

# Example order
new_order = {
    "created_at": "2025-11-23 12:34:56",
    "market_id": 3,
    "store_id": 12345,
    "store_primary_category": "Pizza",
    "order_protocol": 2,
    "total_items": 5,
    "num_distinct_items": 3,
    "subtotal": 2400,
    "min_item_price": 200,
    "max_item_price": 800,
    "total_onshift_dashers": 25,
    "total_busy_dashers": 15,
    "total_outstanding_orders": 30,
    "estimated_order_place_duration": 600,
    "estimated_store_to_consumer_driving_duration": 900,
}

# Preprocess and predict
df_new = pd.DataFrame([new_order])
X_new = preprocess_for_inference(df_new)  # See notebook for function
pred_seconds = model.predict(X_new)[0]

print(f"Predicted ETA: {pred_seconds/60:.2f} minutes")

πŸ“Š Data Overview

Dataset Statistics

  • Records: 197,428 β†’ 196,076 after cleaning (99.3% retained)
  • Features: 18 features (14 original + 4 engineered temporal features)
  • Target: delivery_duration_seconds = actual_delivery_time - created_at
  • Time Period: 2015 data (US/Pacific timezone)

Feature Categories

Category Features Examples
Time 1 market_id (city/region)
Store 3 store_id, store_primary_category, order_protocol
Order Details 5 total_items, subtotal, num_distinct_items, min/max_item_price
Marketplace 3 total_onshift_dashers, total_busy_dashers, total_outstanding_orders
Model Predictions 2 estimated_order_place_duration, estimated_store_to_consumer_driving_duration
Temporal (Engineered) 4 hour_sin/cos, dayofweek_sin/cos

πŸ”¬ Methodology

1. Exploratory Data Analysis

  • Analyzed 197,428 historical delivery records
  • Identified right-skewed distributions in all numeric features (except driving duration)
  • Discovered low correlations (<0.30) confirming non-linear relationships
  • Found marketplace "inconsistencies" (busy_dashers > onshift_dashers) are real operational signals, not errors

2. Data Cleaning

# Key cleaning steps:
βœ“ No duplicates found
βœ“ Removed min_price > max_price (0.4% of data)
βœ“ Removed prices > subtotal (0.3% of data)
βœ“ Removed extreme outliers: total_items=411, prices=14700
βœ“ Removed impossible deliveries >12 hours (only 7 records)
βœ“ Kept marketplace inconsistencies (real operational signal)
βœ“ Final dataset: 196,076 records (99.3% retention)

3. Missing Value Strategies (3 Approaches)

Model Approach Pros Cons
Model 1 Drop missing rows Clean data, no noise Loses ~10% data
Model 2 Simple imputation (median/mode) Fast, keeps all data Adds mild noise
Model 3 XGBoost native handling Best for marketplace data Slightly complex

Winner: Model 3 (XGBoost handles missing values optimally)

4. Feature Engineering

Temporal Features (Model 4 improvement):

# Cyclic encoding preserves temporal continuity
hour_sin = sin(2Ο€ Γ— hour / 24)
hour_cos = cos(2Ο€ Γ— hour / 24)
dayofweek_sin = sin(2Ο€ Γ— dayofweek / 7)
dayofweek_cos = cos(2Ο€ Γ— dayofweek / 7)

Why cyclic? 11 PM is closer to 12 AM than to 3 PM

5. Model Development (5 Iterations)

Model Strategy MAE (sec) RΒ² Accuracy ≀10min Key Insight
Model 1 Drop missing + RF 705.60 0.223 54.02% Baseline
Model 2 Simple imputation + RF 714.56 0.227 53.26% Keeps data but adds noise
Model 3 XGBoost native missing 684.43 0.274 55.83% Best missing handling
Model 4 + Temporal features 668.21 0.296 57.24% Time patterns matter!
Model 5 + Hyperparameter tuning 662.43 0.307 57.39% Production model

6. Hyperparameter Tuning (Model 5)

Best parameters:
β”œβ”€β”€ n_estimators: 300
β”œβ”€β”€ max_depth: 5
β”œβ”€β”€ learning_rate: 0.1
β”œβ”€β”€ subsample: 0.9
└── colsample_bytree: 0.9

Method: RandomizedSearchCV (20 iterations)
CV Strategy: TimeSeriesSplit (3 folds)
Result: +6 seconds MAE improvement, meets all guardrails

πŸ“ˆ Results & Insights

Final Model Performance

Metric Value Interpretation
MAE 662.43 sec (11.04 min) Average prediction error
RMSE 967.79 sec (16.13 min) Penalizes large errors
RΒ² 0.307 Explains 30.7% of variance
Accuracy ≀10 min 57.39% Over half highly accurate
Extreme errors >30 min 4.71% Meets <5% safety threshold

SHAP Analysis: Top 5 Predictive Features

Rank Feature Impact Business Interpretation
1 πŸ₯‡ total_outstanding_orders πŸ”΄ Very Strong + High demand β†’ congestion β†’ delays
2 πŸ₯ˆ total_onshift_dashers 🟒 Very Strong - More dashers β†’ faster pickups
3 πŸ₯‰ estimated_store_to_consumer_driving_duration πŸ”΄ Strong + Distance = delivery time
4 hour_sin πŸ”΄ Moderate + Peak hours cause delays
5 subtotal πŸ”΄ Moderate + Large orders = longer prep

Key Insight: Supply-demand imbalance (#1 and #2) is the primary driver of delivery delays.

Statistical Validation (McNemar's Test)

Question: Is Model 5 significantly better than Model 1?
Test: McNemar's test (paired predictions)
Result: p-value = 2Γ—10⁻⁢ (<< 0.05)

βœ… Conclusion: Improvement is statistically significant, not random chance

πŸ’Ό Business Impact

Actionable Recommendations

1. πŸš— Optimize Dasher Deployment

  • Deploy more dashers when total_outstanding_orders is high
  • Target markets with consistently high marketplace load
  • Expected Impact: Reduce average delivery time

2. ⏱️ Dynamic ETA Buffers

  • Add safety margins during peak hours (hour_sin patterns)
  • Personalize buffers based on subtotal (order complexity)
  • Expected Impact: Reduction in customer complaints

3. 🚨 Proactive Monitoring

  • Auto-flag orders predicted >45 min for ops review
  • Real-time alerts when marketplace saturation detected
  • Expected Impact: Faster response to delays

4. 🀝 Restaurant Partnerships

  • Share estimated_order_place_duration insights
  • Collaborate on reducing order receiving time variability
  • Expected Impact: Improvement in prep time accuracy

Expected Business Outcomes

  • βœ… Customer Satisfaction: Fewer missed ETAs β†’ higher trust
  • βœ… Operational Efficiency: Better dasher allocation β†’ lower costs
  • βœ… Revenue Growth: Improved retention β†’ reduced churn
  • βœ… Competitive Advantage: Industry-leading ETA accuracy

⚠️ Limitations & Future Work

Current Limitations

  • πŸ“… Static 2015 dataset - Marketplace has evolved
  • 🚦 No real-time traffic data - Weather, accidents, road closures
  • πŸ• Missing restaurant capacity - Kitchen busyness, staff levels
  • πŸ“Š Accuracy ceiling at ~57% - Inherent data limitations

Planned Improvements

Future enhancements:
β”œβ”€β”€ Integrate real-time traffic APIs (Google Maps, Waze)
β”œβ”€β”€ Add weather conditions (rain, snow β†’ slower deliveries)
β”œβ”€β”€ Include restaurant historical prep time patterns
β”œβ”€β”€ Build market-specific models (different cities, different patterns)
β”œβ”€β”€ Weekly retraining pipeline (capture seasonality, drift)
β”œβ”€β”€ A/B testing framework (5% traffic rollout before full deployment)
└── MLOps monitoring (Prometheus/Grafana for drift detection)

🧰 Tech Stack

Category Tools
Language Python 3.8+
ML/DL XGBoost, Scikit-learn
Data Processing Pandas, NumPy
Visualization Matplotlib, Seaborn
Explainability SHAP
Validation TimeSeriesSplit, RandomizedSearchCV, McNemar's Test
Deployment Joblib (model serialization)

πŸ“š Dependencies

# Core Data Science Libraries
numpy==2.0.2
pandas==2.2.2

# Visualization
matplotlib==3.10.0
seaborn==0.13.2

# Machine Learning
scikit-learn==1.6.1
xgboost==3.1.1

# Model Explainability
shap==0.50.0

# Model Serialization
joblib==1.5.2

# Statistical Testing
statsmodels==0.14.5



πŸ“– Documentation & Resources


πŸ‘€ Author

Md Moseena


πŸ“Š Project Status

βœ… Production-Ready

  • Model meets all business guardrails (<5% extreme errors)
  • Statistically validated improvement (p < 0.001)
  • Complete deployment artifacts saved (model + encoders + schema)
  • Ready for A/B testing and gradual rollout

⭐ If you found this project helpful, please consider giving it a star!

πŸ“– Read the Full Article | πŸ“§ Contact Me


Last Updated: November 2025