Loss Landscape

Non-convex loss surfaces arise in deep learning. SGD escapes via noise; Adam adapts per-parameter learning rates. Saddle points dominate at high dimension.

Surface: sum of Gaussian bumps + quadratic. Trajectories show optimizer paths from random init.

SGD path
Adam path
GD (no noise)