This paper proposes a new method for image generation by progressively improve the reconstructed image.

The previous image generation models generate the entire image by learning a sampling function (GANs), distribution over a latent vector (VAE), or generate one pixel at a time (PixelRNN, PixelCNN). Although the generated images from these models are in a good quality, these models are forced to learn a complicated and high-dimensional distribution. For example, to generate a car image, the models need to approximate the distribution of all possible cars. This is a difficult task.

###### (Note: I took this Figure from the original paper)

## Incremental Update

Progressive refinement breaks down the complex distribution into a chain of conditional distribution:

Therefore, estimating a conditional distribution is much easier. The conditional probability is modeled by the standard LSTM.

## Latent Variable

Use VAE framework helps us project the input image which has a high dimension into a low-dimensional space. Working on the smaller latent space is much easier than the original image space.

## Attention Mechanism

The progressive refinement through LSTM has simplified the complex distribution through time, then the attention mechanism simplifies the spatial data into a smaller patch. The encoder and decoder now only needs to deal with a small fraction of the image instead of the image as a whole. This idea again reduces the input space by focusing on the important part of the image only.

## Read and Write Operations

This part can be intimated to read at the first glance due to the use of the Gaussian filters. There are many nice blogs that described Read and Write operations with attention mechanism in detail. The main idea is the Read operation crops the input image. The Write operation draws a patch to the canvas matrix.

**Closing**

This is a must-read paper. The combine of progress refinement through time with attention mechanism is a nice idea to simplify the complex image distribution. This is one of the early paper that combine RNN with attention to handle the spatial data such image. I think this is an amazing accomplishment.

**References:**

[1] Gregor, Karol, et al. “DRAW: A recurrent neural network for image generation.” arXiv preprint arXiv:1502.04623 (2015).