Information Geometry & Fisher Metric
Statistical manifold of Gaussians · Geodesics · Natural gradient vs vanilla SGD
Fisher information metric: g_ij(θ) = E[(∂log p/∂θᵢ)(∂log p/∂θⱼ)]. For Gaussian N(μ,σ²): g = diag(1/σ², 2/σ²) — the upper half-plane with Poincaré metric ds² = (dμ² + 2dσ²)/σ².
Geodesics are semicircles in the (μ,σ) upper half-plane (Poincaré geometry, constant negative curvature K = −1/2).
Natural gradient preconditions by F⁻¹: θ ← θ − ηF⁻¹∇L. This moves in units of statistical distance, accelerating convergence where the manifold is flat and slowing where curved.
KL divergence: KL(N(μ₁,σ₁²)‖N(μ₂,σ₂²)) = log(σ₂/σ₁) + (σ₁²+(μ₁−μ₂)²)/(2σ₂²) − ½.