Information Bottleneck — Tishby's Principle

Find the optimal compressed representation T of input X that is maximally informative about output Y. The IB curve traces the Pareto front: maximum I(T;Y) for each value of I(X;T).

Controls


I(X;T): bits
I(T;Y): bits
β: 1.0
Minimize: I(X;T) − β·I(T;Y)

β → 0: max compression
β → ∞: max relevance

IB curve = optimal
trade-off between
compression & prediction