Differentiable memory access with soft attention over an external tape
Ready
Input:
A Neural Turing Machine (Graves et al., 2014) augments a neural network with differentiable read/write heads over an external memory tape. The head uses soft attention: weights w(t) = softmax(−(h−k)²/2σ²) focus on a location k, but gradient can flow everywhere. The network learns to copy, sort, and recall sequences through purely gradient-based training.