ML4Science : Image processing/pattern recognition on MHD spectrograms to automate the detection of phase in the discharge characterized by Magneto-HydroDynamic instabilities
Requirements :
- Anaconda
- PyTorch with CUDA adapted to your machine
The conda environment can be created by running this command
conda env create -f environment.yml
For the LSTM model we provide two scripts, one to train the model and another one to do inference
Running this command will generate the same model saved the models\
folder
python .\train_LSTM.py .\data\ 128 1 0.01 --weight_decay=1e-5 --l1_sigma=1e-4 --dropout_rate=0 --patience=0 --delta=1e-3 --model_path=models --n_epoch=400 --batch_size=128 --max_length=4293
To use our model to predict with new data use this command. Make sure to adapt the path .\data\
python .\test_LSTM.py .\models\lstm.pt .\data\ --max_length=4293 --batch_size=128
The CNN models training and evaluation pipelines are contained in their respective notebooks inside notebooks/resnet18_CNN.ipynb
and notebooks/efficientnet_CNN.ipynb
.
The cross-validation process for the EfficientNet-B0 model is done inside notebooks/cross_validation.ipynb
The repo doesn't contain the dataset; it is the user's responsibility to place the data according to the specified structure. The dataset is available through this SwitchDrive link.
├───data
│ ├───dataset_h5
│ ├───dataset_pickle
│ └───MHD_labels
├───models # Contains the weights of the trained model
├───notebooks
└───src
├───data
└───models
This repo contains 3 models, 2 CNNs and 1 LSTM. You can find below more information about our models
Model | Number of Parameters |
---|---|
LSTM | 78k |
EfficientNet-B0 | 4.8M |
ResNet-18 | 11.3M |
Learning Rate | Dropout rate | Weight decay | Batch size |
---|---|---|---|
0.3 | 1 | 64 |
lr | Hidden size | Num layer | Weight decay | L1 sigma | Batch size |
---|---|---|---|---|---|
128 |
Model | Cohen's kappa | F1 score |
---|---|---|
LSTM | ||
EfficientNet-B0 CNN | ||
ResNet18 CNN |