Multilayer Perceptron, Glossary, Textbook of AI

Also known as: MLP, feedforward network

A Multilayer Perceptron (MLP) is a feedforward neural network consisting of an input layer, one or more fully connected hidden layers, and an output layer. Each layer computes $\mathbf{h}_l = \sigma(W_l \mathbf{h}_{l-1} + \mathbf{b}_l)$, where $W_l$ is a weight matrix, $\mathbf{b}_l$ is a bias vector, and $\sigma$ is a nonlinear activation function such as ReLU. The output of each layer serves as the input to the next, producing a deeply nested composition of functions.

Despite the name, MLPs do not use the hard step function of the original perceptron; they use smooth activations (sigmoid, tanh, ReLU, GELU) that allow gradient-based training via backpropagation. The final layer's activation is chosen to match the task: sigmoid for binary classification, softmax for multi-class classification, or identity for regression. MLPs are universal function approximators in principle, though in practice depth, architecture, and optimisation all matter enormously.

MLPs were the dominant neural network architecture in the 1980s and 1990s, and remain building blocks of virtually every modern deep learning system. A transformer, for instance, alternates self-attention sub-layers with MLPs (called position-wise feedforward networks). For purely tabular data without spatial, temporal, or relational structure, an MLP is often the natural choice. For images, sequences, or graphs, architectures with domain-specific inductive biases, CNNs, RNNs, transformers, graph networks, typically outperform generic MLPs.

Video

Related terms: Neural Network, Perceptron, Backpropagation

Discussed in:

Chapter 9: Neural Networks, Multilayer Networks

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.