Information Bottleneck — Compression vs. Relevance

The IB method finds optimal compressed representation T of X that maximally preserves information about Y. The tradeoff is controlled by β.

IB Plane: I(T;X) vs I(T;Y) — the IB bound curve
Joint distribution p(x,y)
Cluster assignment p(t|x) at current β