Initial codebase: https://github.com/intelligent-environments-lab/ALDI
This repository is the official implementation of ALDI++: Automatic and parameter-less discord detection for daily load energy profiles.
To run locally, you can execute the current environments:
conda env create --file env/environment_<OS>.yaml # replace OS with either `macos` or `ubuntu`
For the forecasting portion of this project (training and prediction), we recommend using the following EC2 instance which was used in our experiments:
- Instance Type:
g4dn.4xlarge
(16 vCPUs, 64 GB RAM, and 600 GB disk) - AMI:
Deep Learning AMI (Ubuntu 18.04)
- Conda environment
tensorflow2_p36
For the forecasting portion of this project, we recommend using the following EC2 instance which was used in our experiments:
- Instance Type: g4dn.4xlarge (16 vCPUs, 64 GB RAM, and 600 GB disk)
- AMI: Deep Learning AMI (Ubuntu 18.04)
- Conda environment
tensorflow2_p36
We chose the following publicly available:
And specifically, the subset used for the Great Energy Predictor III (GEPIII) machine learning competition.
Download the datasets from the competition's data tab into data/
.
The manually labeled outliers, from the top winning teams, are extracted from the following resources:
- rank-1 winning team
and are stored in
data/outliers
Then, run the notebook bad_meter_preprocessing.ipynb
to create the labeled train set.
- Statistical model (2-Standard deviation)
- ALDI
- Variational Auto-encoder (VAE)
- ALDI++ (our method)
Confusion matrices and ROC-AUC metrics are evaluated using the following notebooks:
classification_<model>.ipynb
where <model>
is one of the benchmarked models: 2sd
, vae
, aldi
, aldipp
To specify different settings and parameters pertinent to the data pre-processing, training, and evaluation, modify the files inside the configs/
folder as a yaml
file. The pipeline used for energy forecasting is based on the Rank-1 team's solution.
It is assumed, however, that at least the following folder structure exists:
.
├── configs
│ ├── ..
├── data
│ ├── outliers
│ │ ├── ...
│ ├── preprocessed
│ ├── ...
...
Each yaml
file inside configs/
holds the configuration of different discord detection algorithms. Thus, in order to execute a strip-down version of the Rank-1 team's solution the following line needs to be executed:
./rank1-solution-simplified.sh configs/{your_config}.yaml
Dictionaries with the computed results can be found in results/
.
Our model achieves the following forecasting performance (RMSLE
) and computation time (min) on the GEPIII dataset, the results of the original competition winning team, a simple statistical approach, a commonly used deep learning approch, and the original ALDI are shown too:
Discords labeled by | RMSLE | Computation time (min) |
---|---|---|
Kaggle winning team | 2.841 | 480 |
2-Standard deviation | 2.835 | 1 |
ALDI | 2.834 | 40 |
VAE | 2.829 | 32 |
ALDI++ | 2.665 | 8 |