The goal of this project is to build and explore different generative AI models that can create images from the MNIST hand written digit dataset. I implemented three types of models: Generative Adversarial Networks (GANs), Auto Regressive Models, and Variational Auto Encoders (VAEs). By implementing these models, I try to understand how each one learns the data distribution and how well they can generate new digit images.

Generative Adversarial Network

In this part of the project, I implemented an unconditional Generative Adversarial Network (GAN) to generate MNIST digit images from random noise. A GAN has two main components: the generator $G(z)$ and the discriminator $D(x)$. The generator creates images from noise vectors, and the discriminator tries to classify inputs as real or fake. Both networks are trained with binary cross entropy loss, where the discriminator learns to output 1 for real images and 0 for fake ones, and the generator tries to make the discriminator output 1 for its samples.

Network Architecture

The initial generator was built using fully connected layers with Leaky ReLU activations and a Tanh output. The structure is

\text{Noise} \xrightarrow[]{L+R} 64 \xrightarrow[]{L+R} 128 \xrightarrow[]{L+R} 256 \xrightarrow[]{L+T} \text{Image},

where L is a linear layer, R is a Leaky ReLU activation, and T is a Tanh layer.

The discriminator used a similar, but mirrored fully connected layer design but included dropout for regularization:

\text{Image} \xrightarrow[]{L+R+D} 256 \xrightarrow[]{L+R+D} 128 \xrightarrow[]{L+R+D} 64 \xrightarrow[]{L} \text{ClassLogit},

where D is dropout with probability 0.5.

I trained this first model for 100 epochs, using a learning rate of 2e-4 and a discriminator update frequency of 1, meaning both networks were updated simultaneously.

Further Experiments

Because the discriminator seemed to weaken too quickly, I modified the training setup. In the second trial, I increased the discriminator update frequency to 3, lowered its learning rate to 1e-4, and doubled the size of each hidden layer in the generator. This model was trained for 200 epochs and showed more stable behavior, although the generated images were still similar to the first trial.

In the third trial, I added an extra hidden layer to the generator to increase its capacity:

\text{Noise} \xrightarrow[]{L+R} 64 \xrightarrow[]{L+R} 128 \xrightarrow[]{L+R} 256 \xrightarrow[]{L+R} 512 \xrightarrow[]{L+T} \text{Image}.