Pólya Urn Model

Reinforcement learning's simplest ancestor — paths that reinforce themselves

Current urn

Proportion of red over time

Initial red: 5 Initial blue: 5 Add on draw: 1

—

Start with r red and b blue balls. Draw one at random, return it, and add k more of that color. The proportion of red converges to a Beta(r/k, b/k) distribution — but each run settles at a different random limit. This is "path dependence": early draws strongly influence the long-run outcome. Key result: the limiting proportion is uniformly distributed on [0,1] for equal initial conditions.