Implementation of the inverse scale space training algorithms for sparse neural networks, proposed in A Bregman Learning Framework for Sparse Neural Networks [1]. Feel free to use it and please refer to our paper when doing so.
@article{JMLR:v23:21-0545,
author = {Leon Bungert and Tim Roith and Daniel Tenbrinck and Martin Burger},
title = {A Bregman Learning Framework for Sparse Neural Networks},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {192},
pages = {1--43},
url = {http://jmlr.org/papers/v23/21-0545.html}
}
Our Bregman learning framework aims at training sparse neural networks in an inverse scale space manner, starting with very few parameters and gradually adding only relevant parameters during training. We train a neural network parametrized by weights using the simple baseline algorithm
where
- denotes a loss function with stochastic gradient ,
- is a sparsity-enforcing functional, e.g., the -norm,
- is the proximal operator of .
Our algorithm is based on linearized Bregman iterations [2] and is a simple extension of stochastic gradient descent which is recovered choosing . We also provide accelerations of our baseline algorithm using momentum and Adam [3].
The variable is a subgradient of with respect to the elastic net functional
and stores the information which parameters are non-zero.
We use a sparse initialization strategy by initializing parameters non-zero with a small probability. Their variance is chosen to avoid vanishing or exploding gradients, generalizing Kaiming-He or Xavier initialization.
The different experiments can be executed as Jupyter notebooks in the notebooks folder.
In this experiment we consider the MNIST classification task using a simple multilayer perceptron. We compare the LinBreg optimizer to standard SGD and proximal descent. The respective notebook can be found at MLP-Classification.
In this experiment we consider the Fashion-MNIST classification task using a simple convolutional net. The experiment can be excecuted as a notebook, namely via the file ConvNet-Classification.
In this experiment we consider the CIFAR10 classification task using a ResNet. The experiment can be excecuted as a notebook, namely via the file ResNet-Classification.
This experiment implements the neural architecture search as proposed in [4].
The corresponding notebooks are DenseNet and Skip-Encoder.
The notebooks will throw errors if the datasets cannot be found. You can change the default configuration 'download':False
to 'download':True
in order to automatically download the necessary dataset and store it in the appropriate folder.
If you want to run the code on your CPU you should replace 'use_cuda':True, 'num_workers':4
by 'use_cuda':False, 'num_workers':0
in the configuration of the notebook.
[1] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. "A Bregman Learning Framework for Sparse Neural Networks." Journal of Machine Learning Research 23.192 (2022): 1-43. https://www.jmlr.org/papers/v23/21-0545.html
[2] Woatao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon. "Bregman iterative algorithms for \ell_1-minimization with applications to compressed sensing." SIAM Journal on Imaging sciences 1.1 (2008): 143-168.
[3] Diederik Kingma, Jimmy Lei Ba. "Adam: A Method for Stochastic Optimization." arXiv preprint arXiv:1412.6980 (2014). https://arxiv.org/abs/1412.6980
[4] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. "Neural Architecture Search via Bregman Iterations." arXiv preprint arXiv:2106.02479 (2021). https://arxiv.org/abs/2106.02479