SGD Loss Landscape & Saddle Points

Non-convex loss landscapes L(θ₁,θ₂) have saddle points, flat valleys, and local minima. Gradient descent stalls at saddles; SGD noise helps escape them. Momentum and adaptive methods navigate efficiently.

Learning rate η 0.02

Noise σ 0.05

Momentum β 0.0

Landscape

Step: 0 | Loss: — | θ₁=0.00, θ₂=0.00

Left: contour map — warm=high loss, cool=low loss. Trajectory in white. Right: loss over training steps. SGD noise prevents plateau trapping; momentum accelerates past saddle points.