SGD Loss Landscape

Learning rate η: 0.05
Noise σ: 0.10
Optimizer:

Step: 0 Loss: — Position: —

About: The loss landscape is a mixture of Gaussian "valleys" in 2D parameter space. Three optimizers are compared: vanilla SGD adds random gradient noise (stochastic batches), Momentum accumulates velocity to escape shallow local minima, and Adam uses adaptive per-parameter learning rates. The lower panel shows loss vs. training step.