This is a GitHub directory linked to my master thesis, "statistical study of the GeoPT database", which shall be submitted in 2022. The repository is structured as follows :
- data, this folder contains clean matrices of the rocks compositon.
A. data/raw/ contains the raw files of rocks' composition. B. data/clean/ contains the clean files of rocks' composition where missing values are imputed 3. report, this folder contains the necessary .tex documents to compile the report. 4. src, this folder contains the necessary files of code (mainly written in R) to do the statistical analysis.
The objectives of this project are two-fold. On one hand this is about unveiling the errors structures of geochemical data and on the other hand about spotting outliers.