Pólya Urn Model & Reinforcement Learning

In the Pólya urn, drawing a ball of color c and returning it with α extra balls of the same color creates positive feedback. After n draws, the fraction of each color converges to a Dirichlet distribution — but the outcome (which color dominates) is random. With multiple colors, this is equivalent to Bayesian updating and preferential attachment in networks, generating power-law composition distributions.

Current urn composition:


Total draws: 0
Predicted (Dirichlet mean):
Each color →

Key results:
• Limit: fᵢ → Dir(n₀,…,n₀)
• Variance → 0 as draws → ∞
• But which fᵢ wins is random!
• Exchangeability: de Finetti thm
• PA connection: P(new node attaches
to node i) ∝ kᵢ + α

Power law: degree dist
P(k) ∝ k^(−2−α)