SGD Noise — Loss Landscape Navigation & Flat Minima

Stochastic Gradient Descent noise (from mini-batches) isn't just a nuisance — it actively helps generalization by driving the optimizer toward flat minima with wider basins (Hochreiter & Schmidhuber 1997, Keskar et al. 2017). The noise scale η ≈ lr/batch_size controls exploration. Sharp minima generalize poorly: tiny perturbations in weight space cause large loss jumps. This lab visualizes 2D loss landscape navigation with tunable gradient noise.

Learning rate η0.020

Noise scale σ0.10

Batch size (↑ = ↓noise)32

Landscape sharpness1.0

Position: (—, —)
Loss: —
Noise scale: —
Steps: 0