Stochastic Gradient Descent — Loss Landscape

Navigate a 2D loss landscape with SGD, Momentum, and Adam optimizers

Step: 0 Loss: Position:
About: The loss landscape is a mixture of Gaussian "valleys" in 2D parameter space. Three optimizers are compared: vanilla SGD adds random gradient noise (stochastic batches), Momentum accumulates velocity to escape shallow local minima, and Adam uses adaptive per-parameter learning rates. The lower panel shows loss vs. training step.