Transformer Blocks

Step through self-attention: Q/K/V projections, dot-product attention, and multi-head outputs.

Select a step to explore attention computation.

Step 1/5 Sentence: Head:

Self-attention allows each token to gather context from every other token. Queries ask "what am I looking for?", Keys answer "what do I contain?", and Values carry the actual information. Multi-head attention runs several such computations in parallel, each learning different relationship patterns.