[SIGIR18] Tutorial on GANs for IR (part2)

My reflection on the GANs for IR tutorial [1], presented by Professor Weinan Zhang.

 

Generative Adversarial Networks (GANs)[2]

Although we don’t know p(x), we can build a discriminator to judge the generated data. If the generated data looks real, the discriminator will accept it. If it does not look real, it will reject it. This is the main idea of GANs.

GANs has two components: a generator G and a discriminator D. The generator takes a noise z sampled from a noise prior p(z) and outputs a generated data \hat x:

\hat x = G(z; \theta)

Where z \sim p(z). The discriminator is a function D(x; \phi) that outputs a probability of x being real. In term of implementation, both generator and discriminator are neural networks.

A good discriminator D should be able to tell if the input data is drawn from the true distribution or is articially contructed.

For the real data, the likelihood of x being detected as real by D should be high:

\max_{\theta} E_{x \sim p_{\text{true}}}\big[f( D(x; \theta))\big]

For the generated data: the likelihood of D detects $\hat x$ as a real data should be low ( or as generated data should be high):

\max_{\theta} E_{\hat x \sim \hat p(x; \phi)}\big[ f( 1 - D(\hat x; \theta)) \big]

Function f is typically a log function because D can be modeled as a bernoulli distribution.

Meanwhile, generator G should try to generate a new data that D cannot detect. It means G needs to maximize the likelihood that D will make a bad detection:

\max_{\phi} E_{\hat x \sim \hat p(x)}\big[ f( 1 - D(\hat x; \theta)) \big]

Hence, the objective function of GANs is bi-level:

\max_{\theta, \phi}E_{x \sim p_{\text{true}}}\big[f( D(x; \theta))\big]  + E_{\hat x \sim \hat p(x)}\big[ f( 1 - D(\hat x; \theta)) \big]

Typically, we can iteratively train discriminator and generator. However, training GANs is tricky and very unstable.

With the optimal discriminator D, the lowerbound of the objective function (payoff) is -\log(4).

Reference:

[1] http://wnzhang.net/tutorials/sigir2018/docs/sigir18-irgan-full-tutorial.pdf

[2] Goodfellow, I., et al. 2014. Generative adversarial nets. In NIPS 2014.

 

[SIGIR18] Tutorial on GANs for IR (part1)

My reflection on the GANs for IR tutorial [1], presented by Professor Weinan Zhang.

Maximum Likelihood (MLE)

MLE is a basic algorithm for learning a distribution that fits the data:

\max_{\theta} \frac{1}{|D|}\sum_{x \in D} \log q_{\theta}(x)

This expression can be viewed as a Monte Carlo estimation because the train data we have are sampled from a true distribution:

\max_{\theta} E_{x \sim p(x)}\big[ \log q_{\theta}(x) \big]

In the generative model, what we really care is to approximate the true distribution. So we need to make sure that the learned distribution q_{\theta}(x) can nicely fit the data sampled from the true distribution as much as possible.

If I know a true distribution p(x), I could just measure the KL divergence between q_{\theta} and p(x). But wait! If I know p(x), why I bother trying to approximate it. If I have a golden hen that lays a golden egg everytime I yell “Lay”, then why I even want to spend time learning how to obtain or build a golden egg myself?

The unfortunate truth is that we don’t know the true distribution p(x). That is why we need to approximate it. In order to know how good of our approximation, in MLE, we assume that the higher log-likelihood implies that our approximation or learned distribution q_{\theta}(x) is getting closer to the true distribution p(x).

Hence, MLE literally optimizes:

\max_{\theta} E_{x \sim p(x)}\big[ \log q_{\theta}(x) \big]

This forces the model to fit the data samples. But let us consider the following objective function:

\max_{\theta} E_{x \sim q(x)} \big[ \log p(x) \big]

This implies that we want to the generated data to sits on a high-density area of the true distribution. If that is the case, the generated data is very real. This formula makes more sense for the generative model. But again, we don’t know the true distribution p(x). We need a clever method to teach the generative model. One of the breakthrough methods is GANs – Generative Adversarial Networks [2].

Next post will summarize GANs [2].

Reference:

[1] http://wnzhang.net/tutorials/sigir2018/docs/sigir18-irgan-full-tutorial.pdf

[2] Goodfellow, I., et al. 2014. Generative adversarial nets. In NIPS 2014.