Attention Patterns

Visualize how attention heads focus on different token pairs — hover over cells to inspect.

Sentence: Head: Temperature: 1.0

Attention heads in transformers learn specialized roles: some focus on adjacent tokens (local), others track syntactic dependencies or global context. Temperature controls the sharpness of the softmax distribution — lower temperature produces more focused attention.