Pólya Urn & Reinforcement: Power-Law Emergence

The Pólya urn: draw a ball, return it with α extra of the same color — rich-get-richer. The Chinese Restaurant Process (θ parameter) adds new colors: probability θ/(n+θ) of a new color at step n. Together they generate the Pitman-Yor process, producing power-law (Zipf) frequency distributions. Initial randomness locks in permanently: different runs have wildly different winners.

Pólya Urn & Reinforcement Learning