Bayesian Model Selection

Compare polynomial models of different degrees via Bayesian evidence. The evidence naturally penalizes complexity (Occam's razor). Watch how posterior model probabilities shift as data arrives.

Bayesian model selection: P(M_k|D) ∝ P(D|M_k)·P(M_k). Evidence P(D|M_k) = ∫P(D|θ,M_k)P(θ|M_k)dθ. For linear models: analytic via Laplace approximation. BIC approximation: log P(D|M) ≈ log P(D|θ̂) - (k/2)log N. Occam's razor: evidence prefers simpler models unless data demand complexity. AIC vs BIC vs DIC: different regularization of complexity. Used for feature selection, hypothesis testing, neural architecture.