Information Bottleneck — Tishby's Tradeoff

Optimal compression: minimize I(X;T) while preserving I(T;Y) — the information curve

Distribution Setup

Setup ready.

The information bottleneck principle (Tishby et al. 1999): find a compressed representation T of X that retains maximum information about Y.

min_{p(t|x)} I(X;T) − β I(T;Y)

The IB curve traces Pareto-optimal solutions as β varies. At β→0: T is maximally compressed (ignores Y). At β→∞: T = X (no compression).

Connection to deep learning: hidden layers can be analyzed as points on this curve (Schwartz-Ziv & Tishby 2017). Rate-distortion theory: optimal codes lie on this curve.

Rate-distortion
Minimum sufficiency