Mode collapse, Vanishing gradient

Very good explanation about why it happens. Flat region when discriminator is learning faster (it has an easier job) than generator.

alt text

Earth mover's distance. Wasserstein loss. 1-L continuous condition

alt text