A machine learning-based educational project designed to predict flight ticket prices using historical data — built to understand real-world data preprocessing, model training, and deployment.
What is it?
This project was developed as a learning exercise to explore how machine learning models can be applied to structured datasets. It focuses on analyzing flight fare data and predicting ticket prices based on features like date, route, airline, number of stops, and duration.
Why build it?
- To gain hands-on experience in data preprocessing, feature engineering, and model evaluation.
- To learn how to deploy a trained model using Streamlit for interactive predictions.
- To understand the end-to-end workflow of a machine learning project — from data analysis to real-time user interaction.
├─ .gitignore
├─ README.md ← (this file)
├─ requirements.txt ← Python dependencies
├─ model.ipynb ← Jupyter notebook: exploration, modelling & evaluation
├─ model_preprocess.py ← Preprocessing script
├─ model.pkl ← Trained model artefact
├─ feature_columns.pkl ← Pickled list of feature-column names used by the model
└─ app.py ← Streamlit or web app front-end to allow interactive predictions
⚠️ Adjust file names/paths if yours differ.
-
Data Loading & Exploration
- Study historical flight fare-dataset (from Kaggle)
- Visualise trends: fare vs date, stops, airlines, duration etc.
-
Pre-processing
- Handle missing values, categorical encoding (airline, source, destination, stops)
- Feature engineering (e.g., extracting day/month/year, duration in minutes)
- Save feature-columns list in
feature_columns.pkl.
-
Modelling
- Split into train/test sets
- Try regression algorithms (e.g., XGBoost, RandomForest, etc)
- Evaluate using metrics like MAE, RMSE, etc
- Pick the best performing model and pickle it (
model.pkl).
-
Deployment / App
app.pyprovides a simple UI (Streamlit or similar) where a user enters inputs (date, source, destination, stops, etc) and the system returns a fare prediction in real-time.
-
Clone the repository:
git clone https://github.com/Sayan-Mondal2022/flight-fare-prediction-system.git cd flight-fare-prediction-system -
Create a virtual environment (optional but recommended):
python3 -m venv .venv source .venv/bin/activate # on mac/linux # or on Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the streamlit app:
streamlit run app.py
-
Interact with the UI and input flight details. The model will output a predicted fare.
You can try out the live version of the project here:
👉 Flight Fare Prediction Demo
The web app allows users to:
- Enter flight details such as Travel class, Totalstops, Arrival Time, Departure Time and Journey day.
- Get an instant predicted flight fare based on the trained XGBoost regression model.
- Experience an interactive and user-friendly Streamlit interface designed for learning and experimentation.
Note: This project is deployed purely for educational purposes — predictions are based on sample historical data and not intended for real-world commercial use.
The final model was trained using the XGBoost Regressor, which produced the following performance metrics on the test dataset:
| Metric | Score |
|---|---|
| R² Score | 0.8741 |
| Mean Absolute Error (MAE) | ₹ 4525.58 |
These results indicate that the model explains approximately 87.4% of the variance in flight fares, with an average prediction error of around ₹4,525 — showing strong predictive performance for this learning-based project.
- Language: Python
- Data analysis & modelling: pandas, numpy, scikit-learn, XGBoost
- Deployment: Streamlit app
- Storage: Python pickle for model and feature columns
- Environment management: Virtual env / pip
- Original dataset: “Flight Fare Dataset” on Kaggle
Flight-fare-dataset - Data columns include: Date_of_Journey, Departure_Time, Arrival_Time, Duration, Stops, Price
- Source code uses preprocessing script
model_preprocess.pyand training notebookmodel.ipynb.
Special thanks to platforms like Kaggle, Scikit-learn, and Streamlit for providing datasets, tools, and frameworks that made this project possible.
This project was developed as part of my learning journey in Machine Learning and Data Science — every line of code written here contributed to my deeper understanding of real-world model development and deployment.
Thank you for visiting this project!
This Flight Fare Prediction System was built purely for learning and experimentation in machine learning.
Your feedback, suggestions, and contributions are always welcome — they help make learning even better! ✨
If you found this project helpful, don’t forget to ⭐ the repository.
— Sayan Mondal