Neural Scaling Laws

Exponent α (params) 0.07

Exponent β (data) 0.05

Irreducible loss L∞ 1.00

Model points 7

Kaplan et al. (2020): L(N) ≈ (N_c/N)^α_N, L(D) ≈ (D_c/D)^α_D, power laws over many orders of magnitude.
Chinchilla (Hoffmann 2022): L(N,D) ≈ E + A/N^α + B/D^β. Optimal allocation: N ∝ C^0.5, D ∝ C^0.5 — equal scaling.
Emergent abilities: Some capabilities appear sharply (non-power-law) at critical scales, debated as artifacts of metrics vs true phase transitions.