Neural ODE — Continuous Depth Networks

dh/dt = f(h,t,θ), continuous residual network, normalizing flows

Phase Space — ODE Trajectories

3.1
12
Each point follows dh/dt = f_θ(h,t). Neural ODE = infinitely deep residual net with weight tying. Adjoint method computes gradients in O(1) memory.

Distribution Transformation — Normalizing Flow

80
Continuous normalizing flow (CNF): d log p/dt = -div(f). Maps simple base distribution (Gaussian) to complex target. Invertible by running ODE backwards.

Depth = Time: Residual vs Continuous

0.50
Discrete ResNet: h_{t+1} = h_t + f(h_t) — Euler method approximation of ODE.
Neural ODE: exact ODE solver, adaptive step size, O(1) memory via adjoint, but slower forward pass.

Adjoint Method — Memory vs Accuracy

Adjoint sensitivity: solve ODE backwards for gradients. Memory O(1) independent of integration steps. Trade: recompute activations vs store them. Enables very deep (infinite depth) networks.