Variational autoencoder (VAE) requires an expressive inference network in order to learn a complex posterior distribution. The more complex inference network will result in generating high-quality data.

This work utilizes an adversarial training to learn a function that approximates . The expectation of this term w.r.t $latex q_{\phi}(z|x)$ is in fact a KL-divergence term. Since the authors prove that the optimal , the ELBO becomes:

In order to approximate , the discriminator needs to learn to distinguish between a sample from a prior and the current inference model . Thus, the objective function for the discriminator is setup as:

(1)

Taking a gradient on w.r.t. parameter can be problematic because the solution of this function depends on . But the author shows that the expectation of gradient of w.r.t is 0. Thus, there is no gradient and no parameter update when taking a gradient of .

Since requires sample z, the parametrization trick is applied and the ELBO becomes:

(2)

This step is crucial because now the sampling is just a transformation from a noise and let to approximate the KL-divergence term. This made this model looks like a blackbox model because we do not explicitly define a distribution .

This model optimizes equation (1) and (2) using adversarial training. It optimizes eq.(1) several steps in order to keep close to optimal while jointly optimizes eq. (2).

Adaptive contrast technique is used to make to be sufficiently close to the optimal. Basically, the KL term in ELBO is replaced by where is an auxiliary distribution which could be a Gaussian distribution.

This model has a connection to variational autoencoder, adversarial autoencoder, f-GANs, and BiGANs. A new training method for VAE via adversarial training allows us to use a flexible inference that approximate a true distribution over the latent vectors.

References:

[1] https://arxiv.org/abs/1701.04722