IB Curve I(T;Y) vs I(T;X)
Information Bottleneck (Tishby et al. 1999): Find a compressed representation T of X that maximally preserves information about Y.
Minimize I(X;T) − β·I(T;Y). The IB curve traces the Pareto frontier of compression vs. relevance.
Rate-Distortion theory (Shannon 1948) gives the minimum bits needed for a given reconstruction distortion D: R(D) = minp(T|X): E[d]≤D I(X;T).
At β→0 maximum compression; β→∞ perfect preservation. The IB curve is always concave and lies below the diagonal.