To perform Descriptive and Predictive Analytics of COVID-19 with respect to different Weather parameters such as temperature, humidity, dew point, wind speed, pressure and precipitation intensity.
- Objective
- Understanding the Data
- Data Cleaning and Data Transformation
- Data Enhancement
- Data Analytics
- Data Visualization
To analyse the spread of COVID-19 disease with respect to environmental conditions of a particular region and check whether using the data of weather conditions of a particular place, can we predict the total number of Confirmed Cases on that particular day.
The datasets containing the COVID-19 data and weather data were first cleaned, imputed and then merged. The merged dataset was used to find out the trend of the spread with respect to the date. Observing the plot of total confirmed cases per day vs Days, it was decided to split the dataset into 2 sets - one before March 15 and the other After March 15. March 15 was an elbow point in the graph plotted.
After splitting up based on this point the results improved as this reduced a lot of hidden factors that might have skewed the model. Plots between Confirmed cases grouped by only pressure or only precipitation Intensity was not fruitful as the graph showed no trend in this manner. Using the correlation values obtained, pairs of variables that had good correlation with each other and with the Confirmed Cases were taken and plotted. For this, plotly.express graphs were used. Each graph had one weather parameter each in the x-axis and y axis, and the intensity of colour of the data circles and their size corresponded to the Confirmed Cases count. All these plots did not follow any specific trend. Very minute trends were observed when graphs were plotted only for a particular month, that too for months till March as till then the virus was concentrated in China alone.
Correlation matrix is shown and observed for the 2 split datasets and also for the whole dataset. Then different Machine Learning models were used to try and fit the data to it. The following models were performed:
- Linear regression:
- With combination of max correlated features.
- Simple Linear regression with a weather parameter.
- With all the weather parameters.
- XGboost:
- Simple and Multiple
- SVM
- Radical basis function and Linear Kernel
- Decision Tree based
- Considering only China (more number of cases):
- Linear regression
- Xgboost
In this work, we are motivated to study and analyze the impact of different weather parameters in relation to the number of infected cases due to COVID-19. We have presented descriptive and predictive analytics for the spread of COVID-19 on different features taken from climatic conditions such as temperature, humidity, dew point, wind speed, pressure and precipitation intensity. To validate the proposed result, we have used publicly available datasets which were trained on the specified climatic conditions.
In our fight against coronavirus, it is possible to note here that this project can be used as an input to create general awareness and bust the myth on weather stimulating coronavirus spread that emerged during the past couple of months. Moreover, as we are limping through this period, it is advisable to continue with the lockdown and ensure social distancing until the vaccine is created irrespective of the change in climate.
Dataset are taken from the following links: