[SIGIR18] Tutorial on GANs for IR (part1)

My reflection on the GANs for IR tutorial [1], presented by Professor Weinan Zhang.

Maximum Likelihood (MLE)

MLE is a basic algorithm for learning a distribution that fits the data:

\max_{\theta} \frac{1}{|D|}\sum_{x \in D} \log q_{\theta}(x)

This expression can be viewed as a Monte Carlo estimation because the train data we have are sampled from a true distribution:

\max_{\theta} E_{x \sim p(x)}\big[ \log q_{\theta}(x) \big]

In the generative model, what we really care is to approximate the true distribution. So we need to make sure that the learned distribution q_{\theta}(x) can nicely fit the data sampled from the true distribution as much as possible.

If I know a true distribution p(x), I could just measure the KL divergence between q_{\theta} and p(x). But wait! If I know p(x), why I bother trying to approximate it. If I have a golden hen that lays a golden egg everytime I yell “Lay”, then why I even want to spend time learning how to obtain or build a golden egg myself?

The unfortunate truth is that we don’t know the true distribution p(x). That is why we need to approximate it. In order to know how good of our approximation, in MLE, we assume that the higher log-likelihood implies that our approximation or learned distribution q_{\theta}(x) is getting closer to the true distribution p(x).

Hence, MLE literally optimizes:

\max_{\theta} E_{x \sim p(x)}\big[ \log q_{\theta}(x) \big]

This forces the model to fit the data samples. But let us consider the following objective function:

\max_{\theta} E_{x \sim q(x)} \big[ \log p(x) \big]

This implies that we want to the generated data to sits on a high-density area of the true distribution. If that is the case, the generated data is very real. This formula makes more sense for the generative model. But again, we don’t know the true distribution p(x). We need a clever method to teach the generative model. One of the breakthrough methods is GANs – Generative Adversarial Networks [2].

Next post will summarize GANs [2].

Reference:

[1] http://wnzhang.net/tutorials/sigir2018/docs/sigir18-irgan-full-tutorial.pdf

[2] Goodfellow, I., et al. 2014. Generative adversarial nets. In NIPS 2014.