Neural Tangent Kernel

Key result: Θ(x,x') = ∑_l K_l(x,x') where K_l is the (l-th layer) kernel. At initialization, the NTK is random; in the infinite-width limit, it converges to a deterministic Θ∞ and stays constant during training.
Training dynamics: ∂f/∂t = −Θ(f−y), giving f(t) = (I−e^{−Θt})y. The predictions follow kernel regression with Θ∞.
NTK vs NNGP: NTK governs gradient training; NNGP (Neural Network Gaussian Process) governs Bayesian inference in the same infinite-width limit.