Natural Gradient on Statistical Manifolds

Fisher information metric defines the geometry; natural gradient = G⁻¹∇L follows geodesics

Vanilla loss: —  |  Natural loss: —  |  steps: 0
The natural gradient (Amari 1998) accounts for the curvature of the statistical manifold via the Fisher information matrix G(θ). For Gaussians N(μ,σ²), the Fisher metric is ds² = dμ²/σ² + 2dσ²/σ², equivalent to the Poincaré upper half-plane (hyperbolic geometry). The natural gradient G⁻¹∇L (orange) converges much faster than vanilla gradient descent (blue) when parameters are poorly scaled. Geodesics on this manifold are semicircles — the shortest paths in KL-divergence space.