Multi-Armed Bandit — UCB
UCB vs ε-Greedy vs Thompson Sampling
Arms k:
8
ε (greedy):
0.10
UCB c:
2.0
Steps/frame:
10
Start
Reset
New Arm Rewards
Pulls: 0
UCB regret: 0
ε-greedy regret: 0
Thompson regret: 0
Bars show arm true means. Dots show current estimates. Line chart: cumulative regret over time.
UCB
ε-Greedy
Thompson