Pólya Urn — Reinforcement Learning

Draw a ball, return it with α extra balls of the same color. P(red) converges to a Beta-distributed random variable — path-dependent, order-matters history. Compare to Dirichlet process limit.

2
3
5
5
Total draws: 0 Urn 1 fractions: