Self-Attention Visualizer

Tokens

Temp (1/√d) 1.0

Rows = query tokens • Cols = key tokens • Brighter = higher attention

Each token's attended output (weighted sum of values)

Head 1

Head 2

Query (Q) weights — click to perturb

Key (K) weights

Value (V) weights

d_model=4, d_k=4 • Attention(Q,K,V)=softmax(QKᵀ/√d)V