This project focuses on predicting key weather parameters in India — including temperature_celsius, feels_like_celsius, and cloud coverage — using machine learning models. It leverages a rich dataset from the Indian Weather Repository to train and evaluate a suite of regression algorithms.
- Data preprocessing: cleaning, feature selection, and scaling
- Visual exploration with histograms, heatmaps, scatter plots
- Machine learning models: Linear, Ridge, Lasso, Random Forest, Gradient Boosting, XGBoost
- Evaluation using R², MSE, RMSE, MAE
- K-Fold Cross Validation for performance robustness
- Load dataset (
IndianWeatherRepository.csv) - Clean and preprocess data
- Visualize data relationships and trends
- Split dataset into train/test sets
- Scale features with StandardScaler
- Train multiple regression models
- Evaluate model accuracy and visualize predictions
| File Name | Description |
|---|---|
| IndianWeatherRepository.ipynb | Main Python notebook |
| IndianWeatherRepository.csv | Weather dataset |
| histogram.png | Feature distribution histograms |
| correlation.png | Correlation heatmap |
| air_quality_vs_temp.png | Air quality vs. temperature scatter plot |
| newplot1.png - newplot5.jpg | Additional visualizations (sunburst, polar, etc.) |
The dataset used in this project is publicly available on Kaggle:
📂 Indian Weather Repository – Daily Snapshot
ℹ️ After downloading, rename the file to
IndianWeatherRepository.csvand place it in the same folder as your script before running the project.
- Install required libraries:
pip install pandas matplotlib seaborn plotly scikit-learn xgboost-
Ensure the dataset CSV is in the same folder as the script.
-
Run the script:
python IndianWeatherRepository.py| Model | R² | MSE | RMSE | MAE |
|---|---|---|---|---|
| Linear Regression | 0.72 | 10.79 | 3.28 | 2.56 |
| Lasso | 0.72 | 10.82 | 3.29 | 2.56 |
| Ridge | 0.72 | 10.79 | 3.28 | 2.56 |
| Gradient Boosting | 0.86 | 5.55 | 2.35 | 1.80 |
| Random Forest | 0.93 | 2.60 | 1.61 | 1.08 |
| XGBoost | 0.91 | 3.44 | 1.85 | 1.37 |
✅ Best performer: Random Forest
| Model | R² | MSE | RMSE | MAE |
|---|---|---|---|---|
| Linear Regression | 0.74 | 14.25 | 3.77 | 2.94 |
| Lasso | 0.74 | 14.29 | 3.78 | 2.94 |
| Ridge | 0.74 | 14.25 | 3.77 | 2.94 |
| Gradient Boosting | 0.86 | 7.84 | 2.80 | 2.16 |
| Random Forest | 0.93 | 3.64 | 1.91 | 1.31 |
| XGBoost | 0.91 | 4.73 | 2.18 | 1.64 |
✅ Best performer: Random Forest
- Implement LSTM/GRU for time-series prediction
- Build a real-time web dashboard (e.g., Flask + Streamlit)
- Incorporate external APIs or satellite weather feeds
- Use deep feature engineering for seasonal effects
- Deploy as a cloud-based ML service
Name: Arshdeep Yadav
Department: B.Tech Computer Science & Engineering
Institution: R.E.C. Kannauj


