Neural ODE — Continuous Depth Networks

dh/dt = f(h,t,θ), continuous residual network, normalizing flows

Phase Space — ODE Trajectories

Flow type

Time T3.1

Grid density12

Each point follows dh/dt = f_θ(h,t). Neural ODE = infinitely deep residual net with weight tying. Adjoint method computes gradients in O(1) memory.

Distribution Transformation — Normalizing Flow

Target shape

Train steps80

Continuous normalizing flow (CNF): d log p/dt = -div(f). Maps simple base distribution (Gaussian) to complex target. Invertible by running ODE backwards.

Depth = Time: Residual vs Continuous

Depth/Time0.50

Discrete ResNet: h_{t+1} = h_t + f(h_t) — Euler method approximation of ODE.
Neural ODE: exact ODE solver, adaptive step size, O(1) memory via adjoint, but slower forward pass.

Adjoint Method — Memory vs Accuracy

Adjoint sensitivity: solve ODE backwards for gradients. Memory O(1) independent of integration steps. Trade: recompute activations vs store them. Enables very deep (infinite depth) networks.