Pólya Urn Process

Reinforcement learning — path dependence and Beta distribution convergence

Multiple runs — fraction of red
Beta(a,b) distribution — final fractions
2
2
500
20
1
Current red: Current blue: Red fraction: Expected E[X]: Var[X]:
The Pólya urn (Eggenberger & Pólya 1923): draw a ball, observe its color, return it plus k balls of the same color. The fraction of red balls follows a martingale — its expectation equals the initial fraction a/(a+b) — yet individual runs diverge wildly. At any time, the fraction X_n is Beta(a, b)-distributed (for k=1). This captures preferential attachment and "rich get richer" dynamics in networks, language, and market share. Path dependence makes the long-run outcome unknowable without observing the full history.