Lottery Ticket Hypothesis

Frankle & Carlin (2019): A randomly initialized dense network contains sparse subnetworks ("winning tickets") that can train from scratch to full accuracy when reset to their initialization.

8
3
70
20
Algorithm (IMP — iterative magnitude pruning):
1. Initialize weights θ₀ randomly
2. Train to θ_T (full network)
3. Prune p% of weights with smallest |θ_T| → mask m
4. Reset surviving weights to θ₀ (the winning ticket is (m, θ₀))
5. Retrain the sparse subnetwork — it matches or beats dense performance
Key finding: The ticket's original initialization matters. Reinitializing randomly breaks the lottery (random sparse baselines perform worse).