SGD Loss Landscape Explorer

Stochastic Gradient Descent optimizes a loss function by repeatedly stepping in the direction of steepest descent, with added noise simulating mini-batch variance.

SGD: θ ← θ − η∇L(θ) + ε. Fast but noisy — can escape shallow minima.
Momentum: accumulates velocity v ← βv − η∇L; θ ← θ + v. Damps oscillations, accelerates in consistent directions.
Adam: adaptive per-parameter rates via first and second moment estimates m̂/v̂. Often converges faster in practice.

The color map shows loss value (blue=low, red=high). Trajectory shows last 200 steps.