A modern used car price prediction tool. Check out the website here
Here's a sneak peek of the landing page ⤵ Just enter the details of the car you want to sell and get the predicted price like this ⤵
Welcome to Priceless Wheels! In this project, our goal is to build a model that can accurately predict the price of a used vehicle based on various factors such as make, model, year, mileage, and condition. The automobile industry is one of the largest and most competitive industries in the world, with millions of vehicles being sold each year. The price of a vehicle can have a significant impact on a consumer's purchasing decision and it is important for both buyers and sellers to have an understanding of the market value of a vehicle. By using machine learning algorithms and data analysis, we aim to provide a reliable and robust model that can assist in determining the fair market value of a vehicle. Join us on this exciting journey as we delve into the world of vehicle price prediction.
The dataset is collected from the Cardekho website. A complete overview of the dataset can be found at this kaggle link. The dataset is made publicly available for research and educational purposes.
The scrapper can be found is src/scrapper
directory.
This project uses some of the most common libraries such as pandas
, matplotlib
, scikit-learn
and many more. To install the dependencies, run the following command:
pip install -r requirements.txt
To run the project locally, follow these steps:
- Clone the project ad cd into it
git clone [project-url]
cd priceless-wheels
- Setup a new python environment and install the dependecies.
- Run the
setup.py
file; this will run the preprocessing steps on the data and make it ready for the model. - Cd into the
model_training
directory and run thetraining.py
file. This will train the model and save the model indata/models
. - Feed the appropriate data into
testing.py
file and get the predictions.
The final predictive model is an ensemble of 2 gradient boosting algorithms: A CatBoost Regressor and a LightGBM Regressor. These were chosen because of a multitue of reasons - only one of them uses oblivious or symmetric trees, and other such factors which lead to two slightly different models that can be ensembled together (whcih is apparent from their respective feature importances, even though the performance is boradly similar, the important features are vastly different between the 2 models, therefore making them less correlated and helping in overall variance reduction).
We used an open source HPO library called optuna
. Bayesian Optimized Hyperband along with a TPE sampler was used for optimizing the hyperparameters.
The model achieved a mean average error (MAE) of INR 76,000, and a MAPE of ~10.2%. Considering the location choice, and the competence of buyers and sellers to negotiate a deal, a varation of 10% can be expected.
- Data Collection
- Data Cleaning and Preprocessing
- Exploratory Data Analysis
- Feature Engineering
- Model Building
- Hyperparameter Tuning
- Model Evaluation
- Deployment