Feature Learning — Iris Lab

Feature learning regime (finite width, large η): First-layer weights rotate to align with task-relevant features — the representation changes. Gradient flow through hidden layers creates new basis vectors.
Lazy training / NTK regime (very wide network, small η): Weights barely move; the network behaves as a fixed kernel machine. No representation change occurs.
Key insight (Yang & Hu 2021, μP): The transition is controlled by the learning-rate-to-width ratio. With "maximal update parameterization" (μP), feature learning persists at infinite width.