Step through self-attention: Q/K/V projections, dot-product attention, and multi-head outputs.
Self-attention allows each token to gather context from every other token. Queries ask "what am I looking for?", Keys answer "what do I contain?", and Values carry the actual information. Multi-head attention runs several such computations in parallel, each learning different relationship patterns.