Blackbox Variational Inference

When we want to inference the hidden variables from our generative model, we usually employ either Gibbs sampler or Mean-field variational inference. Both methods have pro sand cons. Gibbs sampler is easily derived because we only need a transitional probability distribution and MCMC will eventually converge to the true posterior. But it is super slow to converge and we don’t know when it will converge. Mean-field variational inference turns inference problem to an optimization problem. The inference procedure is no longer un-deterministic, no more sampling. It is faster than Gibbs sampling and typically harder to derive the closed form update equations.

David Blei’s quote on these two methods:

“Variational inference is that thing you implement while waiting for your Gibbs sampler to converge.”

His message is that deriving the closed form of VI might take as long as Gibbs sampler to converge!

So here is an alternative approach from David Blei’s student: “Blackbox Variational Inference” [1]. They proposed a generic framework for variational inference and help us avoid derivation of variational parameters. The paper has a lot of mathematical details; however, the key ideas are following:

First, it uses stochastic gradient descent to optimize ELBO.

Second, it computes the expectation term w.r.t to the variational distribution with Monte Carlo samples. Basically, we will draw many latent variables from an approximate posterior distribution, and compute an average of the gradient of ELBO.

Third, this is an important contribution. The Monte Carlo samples have a high variance, thus, optimizing the model will be difficult because the learning rate needs to be very small. The author uses Rao-Blackwellization to reduce the variance by replacing random variables with its conditional expectation w.r.t. the conditioning set.

The author also adds control variates and show that a good control variate has high covariance with the function whose expectation is being computed.

The Blackbox VI seems to be a useful tool for quickly exploring the model assumption.