VAE Disentanglement

Variational Autoencoders (VAE) learn a probabilistic encoder q_φ(z|x) ≈ N(μ, σ²I) and decoder p_θ(x|z), maximizing the ELBO: E[log p_θ(x|z)] − β·KL[q_φ(z|x) ‖ N(0,I)]. The β-VAE (Higgins et al. 2017) multiplies the KL term by β > 1, forcing the model to discover disentangled latent dimensions — each z_i controls exactly one independent factor of variation. With β=1 (standard VAE) factors are entangled; high β promotes axis-aligned encoding where each dimension has interpretable meaning (shape, color, size, rotation). The latent traversal shows how each z_i independently controls one visual factor.

VAE DISENTANGLEMENT