Information Bottleneck

The Information Bottleneck: find compressed representation T of X that retains maximal information about Y. Minimize: L = I(T;X) − β·I(T;Y) — trading compression against relevance. β controls the trade-off: β→0 maximizes compression (T ignores Y), β→∞ minimizes distortion (T=X). The IB curve {(I(T;X), I(T;Y))} traces the Pareto frontier — no encoder can exceed it. Applications: neural networks, clustering, rate-distortion theory.