SGD Noise — Loss Landscape Navigation & Flat Minima

Stochastic Gradient Descent noise (from mini-batches) isn't just a nuisance — it actively helps generalization by driving the optimizer toward flat minima with wider basins (Hochreiter & Schmidhuber 1997, Keskar et al. 2017). The noise scale η ≈ lr/batch_size controls exploration. Sharp minima generalize poorly: tiny perturbations in weight space cause large loss jumps. This lab visualizes 2D loss landscape navigation with tunable gradient noise.

0.020
0.10
32
1.0
Position: (, )
Loss:
Noise scale:
Steps: 0