Day 11: 2D one-hot representation

My last post, my CGAN’s architecture does not work and when it is trained, its generator will learn nothing (complete mode collapsing issue). After a few days of research and read a few tips online, I’ve found the architecture for CGAN that works!

Before I describe the specific architecture for CGAN, here are the results:

cgan_correct.png

generated MNIST digits by CGAN. It is easy to interpret each row as a stroke width, stroke style.

CGAN_fashion.png

generated fashion items by CGAN. It is harder to interpret each row.

The loss of discriminator and generator look much better:

DCGAN_MNIST_loss.png

Loss on MNIST dataset

CGAN_fashion_loss.png

Loss on Fashion-MNIST dataset

The most important component of CGAN is the way we combine class label and actual image/latent vector as one input unit to the discriminator/generator. The choice of architecture either makes or breaks the model.

Generator

It takes two inputs: latent vector and class label.

  • Latent vector is a random vector drawn from a normal distribution, mean at 0 with a unit variance.
  • A class label is a one-hot vector.
  • We concatenate these two vectors and use a few feedforward layers to merge them.
  • Finally, we pass the new vector into deconvolutional layers.

Discriminator

It takes two inputs: generated image/real image and class label.

  • Generated image or real image is a 2d matrix.
  • a class label is converted to a 2d one-hot representation. This component was missing from my previous CGAN implementation. I will describe it a bit later.
  • The generated/real image with 1 color channel + 10 channels from 2d one-hot representation.

2D one-hot representation

The idea is simple. For each class label, we represent it as 10 matrices ( or a matrix of size 10 by image width by image height ). For a class label i, the ith matrix is one matrix, the rest are zero.

This representation is simple and combines nicely with the actual 2d image. I will explore deeper into this choice of representations and how it might affect the discriminator.

References:

[1] https://arxiv.org/abs/1411.1784

 

Day 10: Mode Collapsing on my CGAN

I tried to implement a conditional GAN [1]. At first, it seems to be a straightforward extension of GANs. But I ran into a mode collapsing and it was a mess:

Here are digit 5 generated by the conditional GANs:

Epoch 10:

digit_5_10.png

Start off, it looks okay.

Epoch 50:

digit_5_50.png

wait, the model ignores the class label.

Epoch 100:

digit_5_100.png

So the generator was giving up now?

Epoch 150:

digit_5_150.png

Yup, it is a complete mode collapsing. The generator just gave up.

This is not good. I need to find the right architecture for CGAN or the appropriate hyperparameters.

Ideally, what I want to achieve should look like this:

CGAN_expected.png

This is not my image!

I expect to put more efforts in term of training the CGAN model and the pick the right choice of the architectures and hyper-parameters. Stay tune.

References:

[1] https://arxiv.org/abs/1411.1784