Information Entropy & Mixing

Shannon entropy H(P) = −∑ pᵢ log pᵢ measures uncertainty. KL divergence D_KL(P‖Q) = ∑ pᵢ log(pᵢ/qᵢ) measures information gain. Watch how entropy changes as two distributions mix — maximum entropy = maximum ignorance. Mixing α·P + (1−α)·Q is convex in entropy: H(mix) ≥ α·H(P) + (1−α)·H(Q).

Distribution P

Distribution Q

H(P) = bits
H(Q) = bits
H(mix) = bits
D_KL(P‖Q) =
D_KL(Q‖P) =
JS div =