05 | September | 2018

My last post, my CGAN’s architecture does not work and when it is trained, its generator will learn nothing (complete mode collapsing issue). After a few days of research and read a few tips online, I’ve found the architecture for CGAN that works!

Before I describe the specific architecture for CGAN, here are the results:

generated MNIST digits by CGAN. It is easy to interpret each row as a stroke width, stroke style.

generated fashion items by CGAN. It is harder to interpret each row.

The loss of discriminator and generator look much better:

Loss on MNIST dataset

Loss on Fashion-MNIST dataset

The most important component of CGAN is the way we combine class label and actual image/latent vector as one input unit to the discriminator/generator. The choice of architecture either makes or breaks the model.

Generator

It takes two inputs: latent vector and class label.

Latent vector is a random vector drawn from a normal distribution, mean at 0 with a unit variance.
A class label is a one-hot vector.
We concatenate these two vectors and use a few feedforward layers to merge them.
Finally, we pass the new vector into deconvolutional layers.

Discriminator

It takes two inputs: generated image/real image and class label.

Generated image or real image is a 2d matrix.
a class label is converted to a 2d one-hot representation. This component was missing from my previous CGAN implementation. I will describe it a bit later.
The generated/real image with 1 color channel + 10 channels from 2d one-hot representation.

2D one-hot representation

The idea is simple. For each class label, we represent it as 10 matrices ( or a matrix of size 10 by image width by image height ). For a class label i, the ith matrix is one matrix, the rest are zero.

This representation is simple and combines nicely with the actual 2d image. I will explore deeper into this choice of representations and how it might affect the discriminator.

References:

[1] https://arxiv.org/abs/1411.1784

I tried to implement a conditional GAN [1]. At first, it seems to be a straightforward extension of GANs. But I ran into a mode collapsing and it was a mess:

Here are digit 5 generated by the conditional GANs:

Epoch 10:

Start off, it looks okay.

Epoch 50:

wait, the model ignores the class label.

Epoch 100:

So the generator was giving up now?

Epoch 150:

Yup, it is a complete mode collapsing. The generator just gave up.

This is not good. I need to find the right architecture for CGAN or the appropriate hyperparameters.

Ideally, what I want to achieve should look like this:

This is not my image!

I expect to put more efforts in term of training the CGAN model and the pick the right choice of the architectures and hyper-parameters. Stay tune.

References:

[1] https://arxiv.org/abs/1411.1784

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

[Sage]Blog

A personal research blog.

Daily Archives: September 5, 2018

Day 11: 2D one-hot representation

Day 10: Mode Collapsing on my CGAN