Stochastic Gradient Descent adds noise σ to gradients. This noise acts as temperature in a Langevin equation, allowing escape from local minima. Increasing noise (batch size ↓) can improve generalization by finding "flat" minima.