Jacot, Gabriel & Hongler (2018): In the infinite-width limit, neural networks trained by gradient descent are equivalent to kernel regression with the Neural Tangent Kernel (NTK).
3
7
0.05
Key result: Θ(x,x') = ∑_l K_l(x,x') where K_l is the (l-th layer) kernel. At initialization, the NTK is random; in the infinite-width limit, it converges to a deterministic Θ∞ and stays constant during training. Training dynamics: ∂f/∂t = −Θ(f−y), giving f(t) = (I−e^{−Θt})y. The predictions follow kernel regression with Θ∞. NTK vs NNGP: NTK governs gradient training; NNGP (Neural Network Gaussian Process) governs Bayesian inference in the same infinite-width limit.