The generative model estimates or which is different from the discriminative model which estimates a conditional probability directly. An autoregressive model is one of three popular approaches in deep generative models beside GANs and VAE. It models

It models – it factors a joint distribution of an observation as a product of an independent conditional probability distribution. However, implement this model to approximate these distributions directly is intractable because we need parameters for each observation.

NADE [1] proposed a scalable approximation by sharing weight parameters between each observation. The sharing parameters method reduces the number of free parameters and has an effect of regularization because the weight parameters must accommodate for all observation.

For technical details, NADE models the following distribution:

where d is an index of the permutation o. For example, if we have , and o = {2, 3, 4, 1}, then . The permutation of the observation is more generic notations. Once we model the observation, the hidden variables can be computed as:

And we can generate the observation ( a binary random variable) using a sigmoid function:

If we look at NADE’s architecture, it is similar to RBM. In fact, [1] shows that NADE is in fact, a mean-field approximation of RBM (see [1] for details).

Another important property of NADE is computing a joint distribution is linear to the number of dimension of observations because we can express recursively:

Define the basecase as:

And the recurrent relationship as:

This means that computing can be done in a linear fashion.

There are many extension of the Autoregressive model, one of the extension CF-NADE is currently the state-of-the-art of CF. This model can model a binary/discrete random variable which VAE is unable to model currently. So this can be useful for any problem that requires a discrete random variable.

**References:**

[1] Uria, Benigno, et al. “Neural Autoregressive Distribution Estimation.” Journal of Machine Learning Research 17.205 (2016): 1-37.