Adversarial Variational Bayes

Variational autoencoder (VAE) requires an expressive inference network in order to learn a complex posterior distribution. The more complex inference network will result in generating high-quality data.

This work utilizes an adversarial training to learn a function T(x,z) that approximates \log q_{\phi}(z|x) - \log p(z). The expectation of this term w.r.t $latex q_{\phi}(z|x)$ is in fact a KL-divergence term. Since the authors prove that the optimal T^*(x, z) = \log q_{\phi}(z|x) - \log p(z), the ELBO becomes:

E_{p_{D(x)}}E_{q_{\phi}(z|x)}[-T^*(x,z) + \log p_{\theta}(x|z)]

In order to approximate T^*(x, z), the discriminator needs to learn to distinguish between a sample from a prior p_{D(x)}p(z) and the current inference model p_{D(x)}q_{\phi}(z|x). Thus, the objective function for the discriminator is setup as:

\max_T E_{p_{D(x)}}E_{q_{\phi(z|x)}} \log \sigma(T(x,z)) + E_{p_{D(x)}}E_{p(z)} \log(1 - \sigma(T(x,z))) (1)

Taking a gradient on T(x,z) w.r.t. parameter \phi can be problematic because the solution of this function depends on q_{\phi}(z|x). But the author shows that the expectation of gradient of T^*(x, z) w.r.t \phi is 0. Thus, there is no gradient and no parameter update when taking a gradient of T^*(x,z).

Since T(x,z) requires sample z, the parametrization trick is applied and the ELBO becomes:

E_{p_{D(x)}}E_{\epsilon}[-T^*(x, z_{\phi}(x, \epsilon) + \log p_{\theta}(x|z_{\phi}(x, \epsilon))] (2)

This step is crucial because now the sampling is just a transformation from a noise and let T^*(x, z) to approximate the KL-divergence term. This made this model looks like a blackbox model because we do not explicitly define a distribution q_{\phi}(z|x).

This model optimizes equation (1) and (2) using adversarial training. It optimizes eq.(1) several steps in order to keep T(x, z) close to optimal while jointly optimizes eq. (2).

Adaptive contrast technique is used to make T(x, y) to be sufficiently close to the optimal. Basically, the KL term in ELBO is replaced by KL(q_{\phi}(z|x), r_{\alpha}(z|x)) where r_{\alpha}(z|x) is an auxiliary distribution which could be a Gaussian distribution.

This model has a connection to variational autoencoder, adversarial autoencoder, f-GANs, and BiGANs. A new training method for VAE via adversarial training allows us to use a flexible inference that approximate a true distribution over the latent vectors.