Variational autoencoder (VAE) requires an expressive inference network in order to learn a complex posterior distribution. The more complex inference network will result in generating high-quality data.

This work utilizes an adversarial training to learn a function $T(x,z)$ that approximates $\log q_{\phi}(z|x) - \log p(z)$. The expectation of this term w.r.t $latex q_{\phi}(z|x)$ is in fact a KL-divergence term. Since the authors prove that the optimal $T^*(x, z) = \log q_{\phi}(z|x) - \log p(z)$, the ELBO becomes:

$E_{p_{D(x)}}E_{q_{\phi}(z|x)}[-T^*(x,z) + \log p_{\theta}(x|z)]$

In order to approximate $T^*(x, z)$, the discriminator needs to learn to distinguish between a sample from a prior $p_{D(x)}p(z)$ and the current inference model $p_{D(x)}q_{\phi}(z|x)$. Thus, the objective function for the discriminator is setup as:

$\max_T E_{p_{D(x)}}E_{q_{\phi(z|x)}} \log \sigma(T(x,z)) + E_{p_{D(x)}}E_{p(z)} \log(1 - \sigma(T(x,z)))$ (1)

Taking a gradient on $T(x,z)$ w.r.t. parameter $\phi$ can be problematic because the solution of this function depends on $q_{\phi}(z|x)$. But the author shows that the expectation of gradient of $T^*(x, z)$ w.r.t $\phi$ is 0. Thus, there is no gradient and no parameter update when taking a gradient of $T^*(x,z)$.

Since $T(x,z)$ requires sample z, the parametrization trick is applied and the ELBO becomes:

$E_{p_{D(x)}}E_{\epsilon}[-T^*(x, z_{\phi}(x, \epsilon) + \log p_{\theta}(x|z_{\phi}(x, \epsilon))]$ (2)

This step is crucial because now the sampling is just a transformation from a noise and let $T^*(x, z)$ to approximate the KL-divergence term. This made this model looks like a blackbox model because we do not explicitly define a distribution $q_{\phi}(z|x)$.

This model optimizes equation (1) and (2) using adversarial training. It optimizes eq.(1) several steps in order to keep $T(x, z)$ close to optimal while jointly optimizes eq. (2).

Adaptive contrast technique is used to make $T(x, y)$ to be sufficiently close to the optimal. Basically, the KL term in ELBO is replaced by $KL(q_{\phi}(z|x), r_{\alpha}(z|x))$ where $r_{\alpha}(z|x)$ is an auxiliary distribution which could be a Gaussian distribution.

This model has a connection to variational autoencoder, adversarial autoencoder, f-GANs, and BiGANs. A new training method for VAE via adversarial training allows us to use a flexible inference that approximate a true distribution over the latent vectors.

References: