Information Bottleneck

The IB principle: compress X into T while preserving information about Y
min I(X;T) − β·I(T;Y)
I(X;Y) — total info
--
I(X;T) — compression
--
I(T;Y) — relevance
--
IB efficiency
--
p(X,Y) joint distribution
Each row is a source symbol X=x. Each column is a target Y=y. The IB algorithm finds an optimal T that clusters X values by relevance to Y.