EM Algorithm

The Expectation-Maximization (EM) algorithm (Dempster-Laird-Rubin 1977) maximizes log-likelihood with latent variables via two alternating steps. E-step: compute posterior responsibilities r_ik = π_k·N(x_i;μ_k,Σ_k) / Σ_j π_j·N(x_i;μ_j,Σ_j). M-step: update parameters using weighted sufficient statistics — μ_k = Σ r_ik·x_i / Σ r_ik. EM is guaranteed to monotonically increase the incomplete-data log-likelihood at each iteration (Jensen's inequality + KL divergence argument). It converges to a local maximum; initialization matters. EM generalizes: Baum-Welch for HMMs, K-means is hard-EM, and the algorithm appears in information geometry as alternating e/m projections.