Transformer Attention Head

Settings

Sequence length: 6 Key dimension dₖ: 8 Temperature scale:

Q — query vectors
K — key vectors
V — value vectors

Scores = QKᵀ
Scaled: scores/√dₖ
Weights = softmax(scaled)
Output = weights · V

Click a token row to highlight its attention pattern.

Transformer — Scaled Dot-Product Attention

Settings