A Neural Network is a computational model composed of interconnected layers of simple processing units ("neurons") that collectively learn to transform inputs into outputs. A single neuron computes $y = f(\mathbf{w}^T \mathbf{x} + b)$, where $\mathbf{w}$ is a weight vector, $b$ is a bias, and $f$ is a nonlinear activation function. Stacking these units in layers and composing them produces networks capable of approximating virtually any function.
A multilayer perceptron (MLP) consists of an input layer, one or more hidden layers, and an output layer, with every unit in one layer connected to every unit in the next (a fully connected or dense architecture). The hidden layers are the source of the network's power: each progressively transforms the input into a more abstract representation, building complex features from simpler ones. This hierarchical feature learning is the defining advantage of deep networks over shallow models.
Training a neural network amounts to adjusting weights and biases to minimise a loss function, typically via gradient descent coupled with backpropagation to compute gradients. Modern architectures include convolutional networks (for images), recurrent networks (for sequences), and transformers (for sequences and many other modalities). The universal approximation theorem provides theoretical justification for why neural networks can in principle model any continuous function, though practical success depends on architecture, data, and optimisation.
Related terms: Perceptron, Deep Learning, Backpropagation, Activation Function, Multilayer Perceptron
Discussed in:
- Chapter 9: Neural Networks — 9.5 Network Architectures
Also defined in: Textbook of AI, Textbook of Medical AI