Exploration-Exploitation Tradeoff

ε-Greedy · UCB · Thompson Sampling · Boltzmann — Compared Live

Pulls: 0
All four strategies run simultaneously on the same k-arm bandit. Regret = optimal − achieved reward.
ε-Greedy
UCB
Thompson
Boltzmann