Information Bottleneck
Compression vs Relevance Trade-off — Tishby, Pereira & Bialek (1999)
I(X;Y) = — bits
I(T;X) = — bits
I(T;Y) = — bits
Compression = —%
The Information Bottleneck: find compressed representation T of X that retains maximal information about Y.
Minimize: L = I(T;X) − β·I(T;Y) — trading compression against relevance.
β controls the trade-off: β→0 maximizes compression (T ignores Y), β→∞ minimizes distortion (T=X).
The IB curve {(I(T;X), I(T;Y))} traces the Pareto frontier — no encoder can exceed it.
Applications: neural networks, clustering, rate-distortion theory.