Pólya Urn Process
Reinforcement learning — path dependence and Beta distribution convergence
Multiple runs — fraction of red
Beta(a,b) distribution — final fractions
Current red: —
Current blue: —
Red fraction: —
Expected E[X]: —
Var[X]: —
The Pólya urn (Eggenberger & Pólya 1923): draw a ball, observe its color, return it plus k balls of the same color. The fraction of red balls follows a martingale — its expectation equals the initial fraction a/(a+b) — yet individual runs diverge wildly. At any time, the fraction X_n is Beta(a, b)-distributed (for k=1). This captures preferential attachment and "rich get richer" dynamics in networks, language, and market share. Path dependence makes the long-run outcome unknowable without observing the full history.