Glossary

Perceptron

The Perceptron, introduced by Frank Rosenblatt in 1958, is the simplest possible neural network: a single computational unit that takes an input vector $\mathbf{x}$, computes a weighted sum $z = \mathbf{w}^T \mathbf{x} + b$, and applies a step function to produce a binary output. Geometrically, the perceptron implements a hyperplane decision boundary in feature space, classifying points on one side as positive and the other as negative—it is a linear classifier.

The perceptron learning rule updates weights iteratively: for each misclassified example, $\mathbf{w} \leftarrow \mathbf{w} + \eta(y_i - \hat{y}_i)\mathbf{x}_i$. The perceptron convergence theorem guarantees that if the training data is linearly separable, the algorithm finds a separating hyperplane in finitely many steps. The bound depends on the margin—the distance from the nearest point to the boundary—foreshadowing the margin-based reasoning of support vector machines.

The perceptron generated enormous excitement, but Minsky and Papert's 1969 book Perceptrons showed rigorously that a single-layer perceptron cannot compute the XOR function or any non-linearly-separable function. This result cast a long shadow, contributing to the first AI winter, even though multilayer networks could compute XOR—the problem was that no one yet knew how to train them efficiently. That problem was eventually solved by backpropagation, and the perceptron's conceptual descendants—modern neural networks with nonlinear activations and many layers—dominate contemporary AI.

Related terms: Neural Network, Activation Function

Discussed in:

Also defined in: Textbook of AI