Pólya Urn — Reinforcement Learning

Draw a ball, return it with α extra balls of the same color. P(red) converges to a Beta-distributed random variable — path-dependent, order-matters history. Compare to Dirichlet process limit.

Reinforcement α2

Colors K3

Draws/step5

Trajectories5

Total draws: 0 Urn 1 fractions: —