This is a complete Regression Analysis Project where I demonstrate the exploratory data analysis as well as train a Random Forest Regression model to estimate prices of cars in Brazil.
The main skills demonstrated in this project are:
- Data Collecting: I have collected data from the internet to create this dataset with 1352 observations. The data was collected on August, 2021.
- Data Cleaning: I have cleaned, parsed, formatted the data to make it ready for modeling.
- Exploratory Data Analysis: I have done the descriptive statistical analysis, univariated and bivariated analysis to understand the dataset.
- Data Visualization: Several graphics plotted (bars, boxplots, scatterplots etc
- Data inputation: I've inputed data using ML prediction with Random Forest from missingpy library.
- Outliers: I have identified and removed outliers for better performance of the model.
- Testing and Fitting: There are 5 models in this project and they were improved at each iterations with Feature Engineering, Cross Validation and Grid Search techniques, reaching a 95% of Explained Variance (R-Squared)
- The Project ends with a Web App created with Streamlit.