Find the optimal compressed representation T of input X that is maximally informative about output Y. The IB curve traces the Pareto front: maximum I(T;Y) for each value of I(X;T).
Controls
I(X;T): — bits
I(T;Y): — bits
β: 1.0
Minimize: I(X;T) − β·I(T;Y)
β → 0: max compression
β → ∞: max relevance
IB curve = optimal trade-off between compression & prediction