Attention Field

Transformer self-attention: Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V — query/key/value geometry

Sentence

Click a token to select query

Parameters

View Mode

Theory

Q = X·W_Q, K = X·W_K, V = X·W_V

Score(qᵢ,kⱼ) = qᵢ·kⱼ / √d_k

αᵢⱼ = softmax(scores)ᵢⱼ

Output: zᵢ = Σⱼ αᵢⱼ·vⱼ

The dot product measures alignment — how much query i "attends to" key j.