Linear Attention vs Softmax Attention

Max N:512

d (feature dim):64

Linear: sim(Q,K) = φ(Q)ᵀφ(K) — associative trick turns O(N²) into O(N·m) via kernel decomposition