Neural Turing Machine Tape

A Neural Turing Machine (Graves et al., 2014) augments a neural network with differentiable read/write heads over an external memory tape. The head uses soft attention: weights w(t) = softmax(−(h−k)²/2σ²) focus on a location k, but gradient can flow everywhere. The network learns to copy, sort, and recall sequences through purely gradient-based training.