This repository contains the code and hyperparameters used for the experiments described in the following paper:
F. Yesiler, J. Serrà, and E. Gómez. Less is more: Faster and better music version identification with embedding distillation. In Proc. of the Int. Soc. for Music Information Retrieval Conf. (ISMIR), 2020
The instructions on how to download the newly-contributed Da-TACOS training set will be shared soon. In the meantime, you can check our publication to find out more about this dataset.
Below, we specify some use cases and explain the important steps you need to follow for using the code in this repository.
- Evaluating the pre-trained models on Da-TACOS benchmark set
- Evaluating the pre-trained models on a private dataset
- Training a model with the Da-TACOS training set (soon)
- Training a model with a private dataset (soon)
Both the training and the evaluation code in this repository are called through main.py. The arguments for that script can be seen as the following:
python main.py
usage: main.py [-h] [-rt {train,test}] [-mp MAIN_PATH]
[--exp_type {lsr,kd,pruning,baseline}]
[-pri PRUNING_ITERATIONS] [-tf TRAIN_FILE] [-ch CHUNKS]
[-vf VAL_FILE] [-noe NUM_OF_EPOCHS] [-emb EMB_SIZE]
[-bs BATCH_SIZE] [-l {0,1,2,3}] [-kdl {distance,cluster}]
[-ms {0,1,2,3}] [-ma MARGIN] [-o {0,1}] [-lr LEARNING_RATE]
[-flr FINETUNE_LEARNING_RATE] [-mo MOMENTUM]
[-lrs LR_SCHEDULE [LR_SCHEDULE ...]] [-lrg LR_GAMMA]
[-pl PATCH_LEN] [-da {0,1}] [-nw NUM_WORKERS]
[-ofb OVERFIT_BATCH]
Training and evaluation code for Re-MOVE experiments.
optional arguments:
-h, --help show this help message and exit
-rt {train,test}, --run_type {train,test}
Whether to run the training or the evaluation script.
-mp MAIN_PATH, --main_path MAIN_PATH
Path to the working directory.
--exp_type {lsr,kd,pruning,baseline}
Type of experiment to run.
-pri PRUNING_ITERATIONS, --pruning_iterations PRUNING_ITERATIONS
Number of iterations for pruning.
-tf TRAIN_FILE, --train_file TRAIN_FILE
Path for training file. If more than one file are
used, write only the common part.
-ch CHUNKS, --chunks CHUNKS
Number of chunks for training set.
-vf VAL_FILE, --val_file VAL_FILE
Path for validation data.
-noe NUM_OF_EPOCHS, --num_of_epochs NUM_OF_EPOCHS
Number of epochs for training.
-emb EMB_SIZE, --emb_size EMB_SIZE
Size of the final embeddings.
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Batch size for training iterations.
-l {0,1,2,3}, --loss {0,1,2,3}
Which loss to use for training. 0 for Triplet, 1 for
ProxyNCA, 2 for NormalizedSoftmax, and 3 for Group
loss.
-kdl {distance,cluster}, --kd_loss {distance,cluster}
Which loss to use for Knowledge Distillation training.
-ms {0,1,2,3}, --mining_strategy {0,1,2,3}
Mining strategy for Triplet loss. 0 for random, 1 for
semi-hard, 2 for hard, 3 for semi-hard with using all
positives.
-ma MARGIN, --margin MARGIN
Margin for Triplet loss.
-o {0,1}, --optimizer {0,1}
Optimizer for training. 0 for SGD, 1 for Ranger.
-lr LEARNING_RATE, --learning_rate LEARNING_RATE
Base learning rate for the optimizer.
-flr FINETUNE_LEARNING_RATE, --finetune_learning_rate FINETUNE_LEARNING_RATE
Learning rate for finetuning the feature extractor for
LSR training.
-mo MOMENTUM, --momentum MOMENTUM
Value for momentum parameter for SGD.
-lrs LR_SCHEDULE [LR_SCHEDULE ...], --lr_schedule LR_SCHEDULE [LR_SCHEDULE ...]
Epochs for reducing the learning rate. Multiple
arguments are appended in a list.
-lrg LR_GAMMA, --lr_gamma LR_GAMMA
Step size for learning rate scheduler.
-pl PATCH_LEN, --patch_len PATCH_LEN
Number of frames for each input in time dimension
(only for training).
-da {0,1}, --data_aug {0,1}
Whether to use data augmentation while training.
-nw NUM_WORKERS, --num_workers NUM_WORKERS
Num of workers for the data loader.
-ofb OVERFIT_BATCH, --overfit_batch OVERFIT_BATCH
Whether to overfit a single batch. It may help with
revealing problems with training.
There are 4 types of experiments you can create/evaluate: (1) latent space reconfiguration (lsr), (2) knowledge distillation, (3) pruning, and (4) baseline. Which experiment setting to use can be changed with the --exp_type
argument. For each experiment type, one set of default values are created and stored in the data
folder.
To re-create/evaluate the experiments in the paper, you can simply play with --exp_type
, --emb_size
, --kd_loss
, --loss
arguments.
The exact configurations used for each experiment presented in the paper will be shared soon.
To facilitate the benchmarking process and to present a pipeline for evaluating the pre-trained Re-MOVE models, we have prepared benchmark_da-tacos.py. To use the script, you can follow the steps below:
- Python 3.6+
- Create a virtual enviroment and install requirements
git clone https://github.com/furkanyesiler/re-move.git
cd re-move
python3 -m venv venv
source venv/bin/activate
pip install -r requirements_benchmark.txt
After creating the virtual environment and installing the required packages, you can simply run
python benchmark_da-tacos.py --unpack --remove
usage: benchmark_da-tacos.py [-h] [--outputdir OUTPUTDIR] [--unpack]
[--remove]
Downloading and preprocessing cremaPCP features of the Da-TACOS benchmark
subset
optional arguments:
-h, --help show this help message and exit
--outputdir OUTPUTDIR
Directory to store the dataset (default: ./data)
--unpack Unpack the zip files (default: False)
--remove Remove zip files after unpacking (default: False)
This script downloads the metadata and the cremaPCP features of the Da-TACOS benchmark set, and preprocesses them to work with our evaluation setting. Specifically, after downloading the files:
- it downsamples the cremaPCP features by 8,
- reshapes them from Tx12 to 1x23xT (for the intuition behind this step, you can check our paper),
- stores them in a dictionary which is saved as a
.pt
file, - creates ground truth annotations to be used by our evaluation function.
Both the data and the ground truth annotations (named benchmark_crema.pt
and ytrue_benchmark.pt
, respectively) are stored in the data
folder.
After the features are downloaded and preprocessed, you can use the script below to evaluate the pre-trained Re-MOVE model on the Da-TACOS benchmark subset:
python main.py -rt test
or
python main.py -rt test
For this use case, we would like to point out a number of requirements you must follow.
- Python 3.6+
- Create a virtual enviroment and install requirements
git clone https://github.com/furkanyesiler/re-move.git
cd re-move
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Re-MOVE was trained using cremaPCP features, and therefore, it may underperform with other Pitch Class Profile (PCP, or chroma) variants. To extract cremaPCP features for your audio collection, please refer to acoss.
cremaPCP features used for training are created using non-overlapping frames and a hop size of 4096 on audio tracks sampled at 44.1 kHz (about 93 ms per frame).
After obtaining cremaPCP features for your dataset, you should cast them as a torch.Tensor, and reshape them from Tx12 to 1x23xT. For this step, you can use the following code snippet:
import numpy as np
import torch
# this variable represents your cremaPCP feature
cremaPCP = np.random.rand(100, 12)
# this is casting the crema feature to a torch.Tensor type
cremaPCP_tensor = torch.from_numpy(cremaPCP).t()
# this is the resulting cremaPCP feature
cremaPCP_reshaped = torch.cat((cremaPCP_tensor, cremaPCP_tensor))[:23].unsqueeze(0)
When the cremaPCP features per song are ready, you need to create a dataset file and a ground truth annotations file. The dataset file should be a python dictionary with 2 keys: data
and labels
. Each key (i.e. 'data' or 'labels') should point to a python list which contains the respective cremaPCP features and label of that track. Specifically, let dataset_dict
be our dataset dictionary, and dataset_dict['data']
and dataset_dict['labels']
be our lists. The label of the song dataset_dict['data'][42]
should be dataset_dict['labels'][42]
. Finally, the dataset file should be saved under data
folder, and should be named benchmark_crema.pt
. An example code is shown below:
import os
root_dir = '/your/root/directory/of/move'
data = []
labels = []
for i in range(dataset_size):
cremaPCP = load_cremaPCP(i) # loading the cremaPCP features for the ith song of your dataset
label = load_label(i) # loading the label of the ith song of your dataset
data.append(cremaPCP)
labels.append(label)
dataset_dict = {'data': data, 'labels': label}
torch.save(dataset_dict, os.path.join(root_dir, 'data', 'benchmark_crema.pt'))
When your dataset file ('benchmark_crema.pt') is ready, you have to create a ground truth annotations file which is stored in data
folder, and should be named ytrue_benchmark.pt
. This file should be a torch.Tensor with the shape NxN (N is the size of your dataset). Finally, the diagonal of this matrix should be 0. You can find an example code below:
import os
import torch
data_dir = '/your/root/directory/of/move/data/'
labels = torch.load(os.path.join(data_dir, 'benchmark_crema.pt'))['labels']
ytrue = []
for i in range(len(labels)):
main_label = labels[i] # label of the ith song
sub_ytrue = []
for j in range(len(labels)):
if labels[j] == main_label and i!= j: # checking whether the ith and jth song has the same label
sub_ytrue.append(1)
else:
sub_ytrue.append(0)
ytrue.append(sub_ytrue)
ytrue = torch.Tensor(ytrue)
torch.save(ytrue, os.path.join(data_dir, 'ytrue_benchmark.pt'))
After you prepared your dataset and annotation files, you can use the script below to evaluate the pre-trained MOVE model on your dataset:
python main.py -rt test
or
python main.py -rt test
Coming soon...
Coming soon...
For any questions you may have, feel free to create an issue or contact me.
The code in this repository is licensed under Affero GPL v3.
Please cite our reference if you plan to use the code in this repository:
@inproceedings{yesiler2020,
author = "Furkan Yesiler and Joan Serrà and Emilia Gómez",
title = "Less is more: {Faster} and better music version identification with embedding distillation",
booktitle = "Proc. of the Int. Soc. for Music Information Retrieval Conf. (ISMIR)",
year = "2020"
}
This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers).
This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 770376 (TROMPA).