Skip to content

Optimizing neural networks via an inverse scale space flow.

License

Notifications You must be signed in to change notification settings

TimRoith/BregmanLearning

Repository files navigation

📈 BregmanLearning

Implementation of the inverse scale space training algorithms for sparse neural networks, proposed in A Bregman Learning Framework for Sparse Neural Networks [1]. Feel free to use it and please refer to our paper when doing so.

@article{JMLR:v23:21-0545,
  author  = {Leon Bungert and Tim Roith and Daniel Tenbrinck and Martin Burger},
  title   = {A Bregman Learning Framework for Sparse Neural Networks},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {192},
  pages   = {1--43},
  url     = {http://jmlr.org/papers/v23/21-0545.html}
}

💡 Method Description

Our Bregman learning framework aims at training sparse neural networks in an inverse scale space manner, starting with very few parameters and gradually adding only relevant parameters during training. We train a neural network parametrized by weights using the simple baseline algorithm

where

  • denotes a loss function with stochastic gradient ,
  • is a sparsity-enforcing functional, e.g., the -norm,
  • is the proximal operator of .

Our algorithm is based on linearized Bregman iterations [2] and is a simple extension of stochastic gradient descent which is recovered choosing . We also provide accelerations of our baseline algorithm using momentum and Adam [3].

The variable is a subgradient of with respect to the elastic net functional

and stores the information which parameters are non-zero.

🎲 Initialization

We use a sparse initialization strategy by initializing parameters non-zero with a small probability. Their variance is chosen to avoid vanishing or exploding gradients, generalizing Kaiming-He or Xavier initialization.

🔬 Experiments

The different experiments can be executed as Jupyter notebooks in the notebooks folder.

Classification

Mulit Layer Perceptron

In this experiment we consider the MNIST classification task using a simple multilayer perceptron. We compare the LinBreg optimizer to standard SGD and proximal descent. The respective notebook can be found at MLP-Classification.

Convolutions and Group Sparsity

In this experiment we consider the Fashion-MNIST classification task using a simple convolutional net. The experiment can be excecuted as a notebook, namely via the file ConvNet-Classification.

ResNet

In this experiment we consider the CIFAR10 classification task using a ResNet. The experiment can be excecuted as a notebook, namely via the file ResNet-Classification.

NAS

This experiment implements the neural architecture search as proposed in [4].

The corresponding notebooks are DenseNet and Skip-Encoder.

☝️ Miscellaneous

The notebooks will throw errors if the datasets cannot be found. You can change the default configuration 'download':False to 'download':True in order to automatically download the necessary dataset and store it in the appropriate folder.

If you want to run the code on your CPU you should replace 'use_cuda':True, 'num_workers':4 by 'use_cuda':False, 'num_workers':0 in the configuration of the notebook.

📝 References

[1] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. "A Bregman Learning Framework for Sparse Neural Networks." Journal of Machine Learning Research 23.192 (2022): 1-43. https://www.jmlr.org/papers/v23/21-0545.html

[2] Woatao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon. "Bregman iterative algorithms for \ell_1-minimization with applications to compressed sensing." SIAM Journal on Imaging sciences 1.1 (2008): 143-168.

[3] Diederik Kingma, Jimmy Lei Ba. "Adam: A Method for Stochastic Optimization." arXiv preprint arXiv:1412.6980 (2014). https://arxiv.org/abs/1412.6980

[4] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. "Neural Architecture Search via Bregman Iterations." arXiv preprint arXiv:2106.02479 (2021). https://arxiv.org/abs/2106.02479