Self-Attention Visualizer
Edit tokens • Adjust Q/K/V weights • See attention heatmap and output
Tokens
the cat sat on the mat
Compute Attention
Temp (1/√d)
1.0
Attention Heatmap
Rows = query tokens • Cols = key tokens • Brighter = higher attention
Output Representation
Each token's attended output (weighted sum of values)
Q/K/V Weight Matrices
Head 1
Head 2
Query (Q) weights — click to perturb
Key (K) weights
Value (V) weights
Randomize Weights
Reset
d_model=4, d_k=4 • Attention(Q,K,V)=softmax(QKᵀ/√d)V