Optimal compression: minimize I(X;T) while preserving I(T;Y) — the information curve
The information bottleneck principle (Tishby et al. 1999): find a compressed representation T of X that retains maximum information about Y.
The IB curve traces Pareto-optimal solutions as β varies. At β→0: T is maximally compressed (ignores Y). At β→∞: T = X (no compression).
Connection to deep learning: hidden layers can be analyzed as points on this curve (Schwartz-Ziv & Tishby 2017). Rate-distortion theory: optimal codes lie on this curve.