This project implements a simple linear regression model to predict car prices based on mileage. It serves as an introduction to machine learning fundamentals, focusing on:
- Understanding linear regression with a single feature
- Implementation of gradient descent algorithm
- Data visualization and model evaluation
- Data scaling (normalization and denormalization)
-
Linear Regression 📈
A fundamental statistical and machine learning approach that models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to the observed data.
- Understanding the relationship between independent and dependent variables
- How to fit a line to data points
- Predicting continuous values based on input features
-
Gradient Descent 📉
An optimization algorithm that iteratively adjusts parameters to minimize an error function by computing the gradient (derivative) of the loss function and moving in the direction of steepest descent.
- Iterative optimization algorithm
- Finding the minimum of the cost function
- Updating parameters to improve predictions
-
Loss Function 🧮
A function that measures how well your model's predictions match the actual data by calculating the difference between predicted and actual values, where a smaller value indicates better model performance.
- Measuring prediction errors
- Mean Squared Error (MSE)
- Cost function optimization
-
Feature Scaling 🔄
Normalizing the features is important for gradient descent to converge faster and help the algorithm perform better by preventing one feature from dominating the others.
- Normalization (Min-Max Scaling) for features to a range [0, 1]
- Denormalization for converting the scaled features back to their original values
Scaling is applied to the feature before fitting the model and denormalization is used to convert predictions back to the original scale.
The price prediction is based on the following hypothesis:
Where:
θ₀(theta0): Y-interceptθ₁(theta1): Slope of the linemileage: Input feature (X variable)
The model uses gradient descent to minimize the cost function with these update rules:
The Mean Squared Error (MSE) is a metric used to evaluate the performance of the linear regression model
by quantifying the average squared difference between predicted values and actual values.
A lower MSE indicates that the model's predictions are closer to the actual data.
The equation for MSE is as follows:
Where:
mis the total number of data pointsactual[i]is the actual value for the i-th data pointpredicted[i]is the predicted value for the i-th data point
The R-squared value is a statistical measure that indicates the proportion of variance in the dependent variable explained by the independent variable(s). It provides insight into the goodness of fit of the model, with values ranging from 0 to 1. A higher R-squared value indicates a better fit.
The equation for R-squared is as follows:
Where:
- actual[i] is the actual value for the i-th data point
- predicted[i] is the predicted value for the i-th data point
- mean(actual) is the mean of all actual values
The project consists of two main programs:
-
Price Predictor
- Prompts user for mileage input
- Returns estimated car price using trained parameters
- Uses saved θ₀ and θ₁ values
-
Model Trainer
- Reads dataset of car prices and mileages
- Implements gradient descent algorithm
- Saves optimized θ₀ and θ₁ values
- Visualizes data and regression line (bonus feature)
- Data Visualization: Plot showing:
- Raw data points (mileage vs. price)
- Fitted regression line
- Interactive visualization capabilities
- Model Evaluation: Program calculating:
- Mean Squared Error (MSE)
- R-squared value
- Prediction accuracy metrics
Through this project, you will learn:
- Fundamentals of machine learning
- Implementation of gradient descent
- Data preprocessing and normalization
- Model evaluation techniques
- Data visualization best practices
