Shannon (1948): Channel capacity C = max_{p(x)} I(X;Y) = max H(Y)-H(Y|X). For BSC: C = 1 - H(p) bits. For AWGN: C = ½log(1+SNR). Noisy channel coding theorem: reliable transmission is possible iff R ≤ C. Mutual information measures how much Y tells us about X — the reduction in uncertainty.