Non-convex loss surfaces arise in deep learning. SGD escapes via noise; Adam adapts per-parameter learning rates. Saddle points dominate at high dimension.
Surface: sum of Gaussian bumps + quadratic. Trajectories show optimizer paths from random init.