Visualize how attention heads focus on different token pairs — hover over cells to inspect.
Attention heads in transformers learn specialized roles: some focus on adjacent tokens (local), others track syntactic dependencies or global context. Temperature controls the sharpness of the softmax distribution — lower temperature produces more focused attention.