Information Bottleneck
The IB principle: compress X into T while preserving information about Y
min I(X;T) − β·I(T;Y)
β (compression tradeoff)
2.0
|X| (source alphabet)
4
|Y| (target alphabet)
3
Noise level
0.2
New Joint Distribution
Run IB Algorithm
I(X;Y) — total info
--
I(X;T) — compression
--
I(T;Y) — relevance
--
IB efficiency
--
p(X,Y) joint distribution
Each row is a source symbol X=x. Each column is a target Y=y. The IB algorithm finds an optimal T that clusters X values by relevance to Y.