Information Bottleneck

Compression vs Relevance Trade-off — Tishby, Pereira & Bialek (1999)
β (trade-off)1.00
Channels |X|4
Labels |Y|3
Noise σ0.50
I(X;Y) = bits I(T;X) = bits I(T;Y) = bits Compression = %
The Information Bottleneck: find compressed representation T of X that retains maximal information about Y. Minimize: L = I(T;X) − β·I(T;Y) — trading compression against relevance. β controls the trade-off: β→0 maximizes compression (T ignores Y), β→∞ minimizes distortion (T=X). The IB curve {(I(T;X), I(T;Y))} traces the Pareto frontier — no encoder can exceed it. Applications: neural networks, clustering, rate-distortion theory.