Wasserstein GAN

Wasserstein GAN replaces the Jensen-Shannon divergence of original GANs with the Earth Mover's (Wasserstein-1) distance, which measures the cost of transporting probability mass from one distribution to another. The critic (not discriminator) is trained to estimate this distance by satisfying a Lipschitz constraint via weight clipping. Unlike JS divergence, W-distance provides meaningful gradients even when distributions have non-overlapping support — solving the vanishing gradient problem. The training objective becomes: minimize E[C(fake)] − E[C(real)] for generator, maximize the same for critic. This visualization shows the generator distribution (yellow dots) converging toward the real data distribution (cyan rings) over training steps.