In-Context Learning — Implicit Gradient Descent

4
Akyürek et al. 2022: Transformers implement gradient descent in their forward pass via attention