The Large Synoptic Survey Telescope will begin its survey in 2022 and produce terabytes of imaging data each night. To work with this massive onset of data, automated algorithms to classify astronomical light curves are crucial. Here, we present a method for automated classification of photometric light curves for a range of astronomical objects. Our approach is based on the gradient boosting of decision trees, feature extraction and selection, and augmentation. The solution was developed in the context of The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) and achieved one of the top results in the challenge.
For more details, please refer to the paper.
If you are using the results and code of this work, please cite it as
@article{Gabruseva_2019,
title={Photometric light curves classification with machine learning},
author={T. Gabruseva and S. Zlobin and P. Wang},
journal={JAI},
year={2020}
}
The training dataset consisted of simulated astronomical light curves modeled for a range of transients and periodic objects, see data . The dataset is available on kaggle platform.
The training dataset had 7848 light curves from 15 classes, and was highly unbalanced.
Fig. 1. Examples of simulated light curves for each class in the passbands ugrizy. MDJ – Modified Julian Date in days
The evaluation metric was provided in the challenge. The models were evaluated with weighted multi-class logarithmic loss. See evaluation here.
In this paper, we use python boosted decision trees implementation, LightGBM, with 5 folds cross-validation, stratified by classes.
We used different sets of features for the input of LightGBM classifier and selected the optimal features set based on the average 5-folds cross-validation scheme. The hyperparameters used are listed in the paper.
We calculated a number of various features from the light curves. The features exptractors used for the paper can be found in src/feature_extractors . The exptracted features calculated for the train and test sets are available on kaggle dataset for download: features.
To download dataset from kaggle one need to have a kaggle account, join the competition and accept the conditions, get the kaggle API token ansd copy it to .kaggle directory. After that you may run
bash dataset_download.sh
in the command line. The script for downloading and unpacking data is in dataset_download.sh.
- Install anaconda
- You may use the create_env.sh bash file to set up the conda environment
- Download extracted features from kaggle and place them to the input folder.
- Train different LightGBM classifiers from src/classifiers/ folder and
- Run predict on the test data using the same classifiers