No single line separates XOR. A second layer fixes it instantly.
From the chapter: Chapter 9: Neural Networks
Glossary: xor problem, perceptron
Transcript
The XOR function. Inputs zero zero, output zero. One one, output zero. Zero one and one zero, output one.
Plot the four points on a 2D grid. Two diagonally-opposite points are class one. The other two are class zero.
Try to separate them with a single line. Impossible. Any line you draw mixes the classes on both sides. XOR is not linearly separable.
In 1969, Minsky and Papert proved this in a book that paused neural network research for over a decade. The single-layer perceptron, popular at the time, simply could not represent XOR. The first AI winter began.
Add one hidden layer with two units. The first hidden unit fires for "x is one". The second fires for "y is one". The output unit fires when exactly one of them is on.
Two layers, three units, and XOR is solved.
Geometrically, the hidden layer transforms the input space. The four points get rearranged so that a single line in the new space separates the classes.
This is the core power of deep learning: compose layers, learn representations that make the next layer's job linear.
Backpropagation, rediscovered in the 1980s, gave a way to train these multi-layer networks. The XOR crisis ended.