We integrate the Variational Inference framework with the autograd functionality of pytorch. To this end, we have extended pytorch autograd by defining a Function for the computation of DPMM:
- in the forward pass, we compute the cluster assignment and the elbo given the data and the model parameters;
- in the backward pass, we compute the natural gradient of the model parameters.
This allows to train DPMM without changing the training loop (which becomes the same of a neural model):
from torch_dpmm.models import GaussianDPMM
import torch.optim as optim
th_DPMM = FullGaussianDPMM(K=K, D=D, alphaDP=10, tau0=0, c0=1, n0=3*D, B0=1)
optimiser = optim.SGD(params=m.parameters(), lr=0.1)
for i in range(num_epochs):
optimiser.zero_grad()
pi, elbo_loss = th_DPMM(x)
elbo_loss.backward()
optimiser.step()
There are four types of Gaussian DPMMs that differs in terms of parametrisation of the covariance matrix:
-
UnitGaussianDPMM
defines an identity covariance matrix (no parameters); -
SingleGaussianDPMM
defines a covariance matrix in the form$sI$ , where s is a scalar parameter; -
DiagonalGaussianDPMM
defines a diagonal covariance matrix$D$ , where all the element on the diagonal can be adjusted during the training -
FullGaussianDPMM
defines a fully trainable covariance matrix.
We provide a complete example here. We recommend to download the notebook to view the animations.
You can install torch_dpmm by running the command:
pip install git+https://github.com/danielecastellana22/torch_dpmm.git@main
The implementation is fully based on the natural parametrisation of the exponential family distribution. This allows to compute the natural gradient of the variational parameters in a straightforward way. Check this paper for more details on the Stochastic Variational Inference (SVI) framework and the computation of the natural gradient of the variational parameters.
There are three main abstract classes:
ExponentialFamilyDistribution
BayesianDistribution
DPMM
ExponentialFamilyDistribution
represents a distribution of the exponential family by defining the base measure, the log-partition function, the sufficient statistics. It also provides utility functions such as the mapping between the common and the natural parametrisation and the computation of the KL divergence.
BayesianDistribution
represents a Bayesian distributions by using conjugate priors. Let ExponentialFamilyDistribution
.
DPMM
represents a DPMM which is fully determined by specifying a Dirichlet Process (DP) prior over the mixture weights and an emission distribution. The DP prior is approximated through the truncated Stick-Breaking distribution. Both the emission and mixture weights distribution are defined as object of type BayesianDistribution
.
The computation of the Elbo and the assignments are executed by defining a new pytorhc autograd function. This allows to compute the gradient of the variational parameters by calling elbo.backward()
.
- Add Cholesky parametrisation of the full Guassian distribution to improve numerical stability.
- Improve the initialisation of the variational parameter.