In this repository you can find the RAW Rn data collected in Bogotá, Colombia and the codes, written in Python
pogramming language (.ipynb), that were used to analyse the RC data retrieved in the context of the publication Indoor 222Rn Modeling in Data-Scarce Regions: An Interactive Dashboard Approach for Bogotá, Colombia. Additionally, a dashboard was created to make the interaction with the data more user friendly and to facilitate the replicability of this type of studies in other study areas. Further information about the dashboard source code and functionality can be found here.
The repository is divided three jupyter notebooks and four data folders.
-
Folders:
-
Dataset for fitting
Folder with the Raw data (
Raw_Results_LR115.xlsx
) used in Data distribution.ipynb and the dataset with dependent and independent variables used for fitting the regression models (Processed_DataFrame.csv
). -
Dataset for regression
Folder with the cadaster data to which the regression will be applied. This dataset must have the same independent variables than the dataset used for fitting the model. In the repository this data is zipped for storage purposes. When the
Multivariate analysis.ipynb
is ran the dataset is unzipped. -
Figures
Folder with all of the figures created in the Data distribution.ipynb and Multivariate analysis.ipynb.
-
Regression results
This folder contains the results of regressions created in the Multivariate analysis notebook.
-
-
Notebooks
-
Data distribution.ipynb
Jupyter notebook with basic statistical analysis of the raw RC data (Raw_Results_LR115.xlsx).
-
Multivariate anlysis.ipynb
Jupyter notebook with:
- Multivariate analysis of the processed dataset (Processed_DataFrame.csv) [correlation matrix, PCA, etc.]
- Fitting of RC data using predictors.
- Perform feature selection
- Estimate RC in the Dataset for regression (Cadastre information)
-
Dashboard App
An improved and updated version of this dashboard can be accessed online here. Nevertheless, the datasets presented in this repository can be used as an example in the dashboard.
-
Radon ($^{222}$Rn) is a naturally occurring gas that represents a health threat due to its causal relationship with lung cancer. Despite its potential health impacts, several regions have not conducted studies, mainly due to data scarcity and/or economic constraints. This study aims to bridge the baseline information gap by building an interactive dashboard that uses inferential statistical methods to estimate indoor radon concentration’s (IRC) spatial distribution for a target area. We demonstrate the functionality of the dashboard by modeling IRC in the city of Bogotá, Colombia, using 30 in situ measurements. The IRC were measured for 35 days using Alpha-track detectors (LR-115). IRC measured were the highest reported in the country, with a geometric mean of 91 ±14 Bq/m$^3$ and a maximum concentration of 407 Bq/m$^3$. In 56.66% of the residences RC exceeded the WHO's recommendation of 100 Bq/m$^3$. A prediction map for houses registered in Bogotá’s cadaster was built in the dashboard by using a log-linear regression model fitted with the in situ measurements, together with meteorological, geologic and building specific variables. After feature selection, the log-linear model showed a cross-validation Root Mean Squared Error (RMSE) of 56.5
This Jupyter notebook reads the RAW data (Raw_Results_LR115.xlsx
) and create graphs for easier visualization and comparison with recommended levels and previous measurements in Latin America and the Caribbean (LAC) region.
This Jupyter notebbok uses the information of the RC data (Dependent variable) and the independent variables (Processed_DataFrame.csv
) to fit one log-linear regression model.
Subsequently, this notebook uses the data of Bogotá's cadastre to apply the regression model on all the houses with information of the independent variables (Information taken from Bogotá's cadaster). The data is rasterize using GDAL
tools.
The outputs of this model are:
- Figures:
- Variable caracterization figure (
Figures/Caracterization.png
) - Principal component biplot figure (
Figures/PCA_RC.png
) - Percent change calculated for all independent variables (
Figures/Regresión_LogLineal.png
) - Percent change calculated for independent variables after feature selection (
Figures/Regresión_LogLineal_withFeatureSel.png
) - Residential RC estimated distribution (
Figures/Estimated_Rn_Histogram.png
)
- Variable caracterization figure (
- Files (To Regression results):
- RC estimated for each house in cadaster information
LinReg_model_results.csv
. - Raster with RC regression results (
Log_Linear_estimations.tif
)
- RC estimated for each house in cadaster information
Refer to the github repository here to see the source code and the online running version of the dashboard here to make use of it.
Initial display of Dashboard app.