Watch gradient descent navigate a rugged loss landscape with noise and momentum
0.01
0.20
Rosenbrock
Step: 0
Loss: —
Best: —
Stochastic Gradient Descent optimizes a loss function by repeatedly stepping in the direction of steepest descent, with added noise simulating mini-batch variance.
SGD: θ ← θ − η∇L(θ) + ε. Fast but noisy — can escape shallow minima. Momentum: accumulates velocity v ← βv − η∇L; θ ← θ + v. Damps oscillations, accelerates in consistent directions. Adam: adaptive per-parameter rates via first and second moment estimates m̂/v̂. Often converges faster in practice.
The color map shows loss value (blue=low, red=high). Trajectory shows last 200 steps.