2-2-1 MLP learning XOR via backpropagation — watch weights and decision boundary evolve
Loss: — | Epoch: 0
Decision Boundary
Network Architecture
Loss Curve
XOR is not linearly separable — a single perceptron cannot solve it, which famously stalled AI research in the 1970s. A hidden layer with nonlinear activations creates a curved decision boundary that separates the XOR classes. Backpropagation computes ∂Loss/∂w via the chain rule, propagating gradients backward through the network. The sigmoid σ(x) = 1/(1+e⁻ˣ) has vanishing gradients for large |x|; tanh is zero-centered; ReLU = max(0,x) avoids saturation.