Animate gradient descent on a 2D loss landscape — compare SGD, Momentum, and Adam.
Gradient descent moves parameters downhill on the loss surface. Momentum accumulates velocity, while Adam adapts per-parameter learning rates using first and second moment estimates — dramatically improving convergence on ill-conditioned surfaces.