Skip to content

Active Learning for NIR data - Repository to track the development of my thesis in bio data science

License

Notifications You must be signed in to change notification settings

thegitrogmonster/Zamberger_thesis_AL2024

Repository files navigation

This Repository is used to investigate the Application of Active Learning for predicting the age of wood from FT-IR spectroscopical data. The main focus is the comparison of different sampling strategies, with uncertainty as a priority. Random sampling in an active learning framework is used as the baseline for the comparison.

The code is developed with functionalities from scikit-learn [1], especially RandomSearchCV, GridSearchCV, as well as most statistical models.

The application for active learning in this scenario will be based on the assumption that a sufficiantly well known model space exists. To represent a few examples of modelbuilding, some sci-kit learn Regression Methods are tested, as well as XGBoost[2].

For the active Learning some selection strategies are implemented, but could be extendend with another function definition in the al_lib/selection_criteria.py file.

Usage

Currently it can be (possibly) only used as a reference to build a similar project.

Structure

The repository is primarily structured as follows:

  • Data Import
  • Data Exploration
  • Modelbuilding
  • Active Learning
  • A Library 'al_lib'

Prerequisites

This code was developed in a conda environment. To recreate the environment the thesis_conda.yml file can be used: For the full list you can see the thesis_conda.yml for details or create a conda environment from the spec-file via:

$ conda create --name <ENV_NAME> --file thesis_conda.yml 

References

[1] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011, accessed: 07.06.2024

[2] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794, accessed: 07.06.2024. [Online]. Available: http://doi.acm.org/10.1145/2939672.2939785

Author

Zamberger Bernd 202375[at]fhwn.ac.at

About

Active Learning for NIR data - Repository to track the development of my thesis in bio data science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published