Fisher information metric defines the geometry; natural gradient = G⁻¹∇L follows geodesics
Vanilla loss: — | Natural loss: — | steps: 0
The natural gradient (Amari 1998) accounts for the curvature of the statistical manifold via the
Fisher information matrix G(θ). For Gaussians N(μ,σ²), the Fisher metric is ds² = dμ²/σ² + 2dσ²/σ²,
equivalent to the Poincaré upper half-plane (hyperbolic geometry). The natural gradient G⁻¹∇L
(orange) converges much faster than vanilla gradient descent (blue) when parameters are poorly scaled.
Geodesics on this manifold are semicircles — the shortest paths in KL-divergence space.