A Machine Learning system for predicting car prices using advanced regression models and custom evaluation metrics.
Features β’ Installation β’ Usage β’ Models β’ NewMetric β’ Project Structure
This project implements a comprehensive car price prediction system using machine learning techniques. It features a custom evaluation metric called NewMetric designed specifically for car price prediction, along with multiple regression models for comparison.
- π― Custom NewMetric for specialized car price evaluation
- π€ Multiple ML Models including Gradient Boosting, Random Forest, and more
- π Visual Analytics with detailed comparison charts
- πΎ Model Persistence for easy deployment and reuse
- π Interactive Prediction system for real-time price estimation
| Feature | Description |
|---|---|
| Model Training | Train multiple regression models and compare their performance |
| NewMetric Evaluation | Custom metric combining MAE, RMSE, and Relative Error |
| Model Comparison | Side-by-side comparison of 5 different ML algorithms |
| Visualization | Generate publication-ready charts and graphs |
| Model Export | Save trained models for production use |
| Batch Prediction | Predict prices for multiple cars from Excel files |
| Interactive CLI | User-friendly command-line interface |
- Python 3.8 or higher
- pip package manager
git clone https://github.com/yourusername/CarPricePrediction.git
cd CarPricePredictionpip install pandas numpy scikit-learn matplotlib openpyxl| Library | Version | Purpose |
|---|---|---|
pandas |
β₯1.3.0 | Data manipulation and analysis |
numpy |
β₯1.20.0 | Numerical computing |
scikit-learn |
β₯1.0.0 | Machine learning algorithms |
matplotlib |
β₯3.4.0 | Data visualization |
openpyxl |
β₯3.0.0 | Excel file support |
To train and build the final prediction model:
python car_price_prediction.pyWhat this does:
- Loads data from
data.xlsx - Trains a Gradient Boosting model with optimized parameters
- Evaluates performance using NewMetric
- Saves the trained model to
car_price_model.pkl - Generates visualization charts
Output Files:
car_price_model.pkl- Trained model filefinal_model_results.png- Prediction vs Actual chartsfinal_feature_importance.png- Feature importance visualization
To compare different ML algorithms:
python "Comparison of models.py"Models Compared:
- Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest Regressor
- Gradient Boosting Regressor
Output Files:
model_results.png- Model comparison chartsfeature_importance.png- Feature importance for best model
To make predictions with the saved model:
python use_model.pyAvailable Options:
| Option | Description |
|---|---|
1 |
Display list of features |
2 |
Interactive prediction (enter values manually) |
3 |
Batch prediction from Excel file |
4 |
Exit |
from use_model import load_model, predict_price
# Load the trained model
model, feature_names = load_model("car_price_model.pkl")
# Prepare feature values (normalized between 0 and 1)
feature_values = {
"Ϊ©ΫΩΩΩ
ΨͺΨ±_ΩΨ±Ω
Ψ§Ω": 0.3,
"Ψ³Ψ§Ω_ΩΨ±Ω
Ψ§Ω": 0.8,
# ... other features
}
# Get prediction
predicted_price = predict_price(model, feature_names, feature_values)
print(f"Predicted Price: {predicted_price:,.0f} Toman")from use_model import load_model, predict_from_excel
# Load model
model, feature_names = load_model()
# Predict for all cars in Excel file
results = predict_from_excel(
model,
feature_names,
excel_path="new_cars.xlsx",
output_path="predictions.xlsx"
)| Model | Description | Best For |
|---|---|---|
| Gradient Boosting | Ensemble of weak learners | β Best overall performance |
| Random Forest | Ensemble of decision trees | Robust to overfitting |
| Ridge Regression | L2 regularized linear | When features are correlated |
| Lasso Regression | L1 regularized linear | Feature selection |
| Linear Regression | Basic linear model | Baseline comparison |
The production model uses Gradient Boosting Regressor with optimized parameters:
GradientBoostingRegressor(
n_estimators=200,
learning_rate=0.1,
max_depth=5,
min_samples_split=5,
min_samples_leaf=2,
subsample=0.8,
random_state=42
)NewMetric is a custom evaluation metric designed specifically for car price prediction. It combines multiple error measures to provide a comprehensive assessment of model performance.
Where:
- MAE_norm = MAE / Mean Price (Normalized Mean Absolute Error)
- RMSE_norm = RMSE / Mean Price (Normalized Root Mean Square Error)
- RelativeError = Mean of |Actual - Predicted| / Actual
| NewMetric Value | Performance |
|---|---|
| < 0.10 | π’ Excellent |
| 0.10 - 0.15 | π‘ Good |
| 0.15 - 0.20 | π Average |
| > 0.20 | π΄ Needs Improvement |
Note: Lower values indicate better performance.
CarPricePrediction/
β
βββ π car_price_prediction.py # Main training script with final model
βββ π Comparison of models.py # Model comparison and evaluation
βββ π use_model.py # Inference and prediction utilities
β
βββ π data.xlsx # Training dataset (required)
βββ π€ car_price_model.pkl # Saved model (generated)
β
βββ π final_model_results.png # Prediction charts (generated)
βββ π final_feature_importance.png
βββ π model_results.png
βββ π feature_importance.png
β
βββ π README.md # English documentation
βββ π README_FA.md # Persian documentation
| File | Purpose |
|---|---|
car_price_prediction.py |
Trains the final Gradient Boosting model, evaluates it, and saves it for production use |
Comparison of models.py |
Compares 5 different ML models using NewMetric and traditional metrics |
use_model.py |
Provides utilities for loading saved models and making predictions |
data.xlsx |
Excel file containing training data with normalized features |
car_price_model.pkl |
Serialized trained model for deployment |
The input Excel file (data.xlsx) should contain:
| Column | Type | Description |
|---|---|---|
ΩΫΩ
Ψͺ |
Numeric | Target variable (price in Toman) |
*_ΩΨ±Ω
Ψ§Ω |
Numeric (0-1) | Normalized feature columns |
Ϊ©ΫΩΩΩ ΨͺΨ±_ΩΨ±Ω Ψ§Ω- Normalized mileageΨ³Ψ§Ω_ΩΨ±Ω Ψ§Ω- Normalized yearΨ±ΩΪ―_ΩΨ±Ω Ψ§Ω- Normalized color encoding- And more...
The system generates comparison charts showing:
- NewMetric scores for all models
- MAPE (Mean Absolute Percentage Error)
- Actual vs Predicted scatter plot
- Error distribution histogram
β
Actual: 1,200,000,000 | Predicted: 1,180,000,000 | Error: 1.7%
β
Actual: 850,000,000 | Predicted: 870,000,000 | Error: 2.4%
β οΈ Actual: 500,000,000 | Predicted: 450,000,000 | Error: 10.0%
| Issue | Solution |
|---|---|
| Model file not found | Run car_price_prediction.py first to generate the model |
| Missing features warning | Some features in your data may not match the model's expected features |
| Memory error | Reduce dataset size or use a machine with more RAM |
If Persian text doesn't display correctly in charts, install a Persian-compatible font:
plt.rcParams["font.family"] = "DejaVu Sans"This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
For questions or support, please open an issue on GitHub.
Made with β€οΈ for the Car Industry
β Star this repo if you find it helpful!