Probability Distribution Editor
Entropy vs Probability (Binary Channel)
H(p) = −p·log₂(p) − (1−p)·log₂(1−p) peaks at p=0.5 with H=1 bit
Mutual Information & Channel Capacity
I(X;Y) = H(X) − H(X|Y) — information shared between input and output
Distribution Histogram
Entropy vs Distribution Shape
Shannon's Information Theory (1948)
Claude Shannon defined entropy as H(X) = −Σ pᵢ log₂ pᵢ bits — the average surprise (information) per symbol. A fair coin flip = 1 bit. A fair die = log₂(6) ≈ 2.585 bits.
Maximum entropy is achieved by the uniform distribution: H_max = log₂(n) for n symbols. Any departure from uniformity reduces entropy.
Channel capacity C = max_{p(x)} I(X;Y) — the maximum rate at which information can be reliably transmitted. Shannon's channel coding theorem: transmission at rate R < C is possible with arbitrarily small error probability.
Mutual information I(X;Y) = H(X) + H(Y) − H(X,Y) = H(X) − H(X|Y) — reduction in uncertainty about X given knowledge of Y. Zero for independent variables, equals H(X) for perfect channels.
The binary entropy function H(p) = −p log₂p − (1−p)log₂(1−p) is concave, symmetric about p=0.5, with maximum 1 bit at p=0.5 and minimum 0 at p=0 or 1.