Gradient Flow

Animate gradient descent on a 2D loss landscape — compare SGD, Momentum, and Adam.

SGD Momentum Adam

LR: 0.05 Landscape:

Gradient descent moves parameters downhill on the loss surface. Momentum accumulates velocity, while Adam adapts per-parameter learning rates using first and second moment estimates — dramatically improving convergence on ill-conditioned surfaces.