Attention Field
Transformer self-attention: Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V — query/key/value geometry
Sentence
Sentence 1
Sentence 2
Sentence 3
Click a token to select query
Parameters
d_model:
8
Temperature (1/√dₖ):
1.0
Num heads:
1
Randomize W_Q, W_K, W_V
View Mode
Attention Map
Q/K Space
Output
Theory
Q = X·W_Q, K = X·W_K, V = X·W_V
Score(qᵢ,kⱼ) = qᵢ·kⱼ / √d_k
αᵢⱼ = softmax(scores)ᵢⱼ
Output: zᵢ = Σⱼ αᵢⱼ·vⱼ
The dot product measures alignment — how much query i "attends to" key j.