Neural Network, Glossary, Textbook of AI

A Neural Network is a computational model composed of interconnected layers of simple processing units ("neurons") that collectively learn to transform inputs into outputs. A single neuron computes $y = f(\mathbf{w}^T \mathbf{x} + b)$, where $\mathbf{w}$ is a weight vector, $b$ is a bias, and $f$ is a nonlinear activation function. Stacking these units in layers and composing them produces networks capable of approximating virtually any function.

A multilayer perceptron (MLP) consists of an input layer, one or more hidden layers, and an output layer, with every unit in one layer connected to every unit in the next (a fully connected or dense architecture). The hidden layers are the source of the network's power: each progressively transforms the input into a more abstract representation, building complex features from simpler ones. This hierarchical feature learning is the defining advantage of deep networks over shallow models.

Training a neural network amounts to adjusting weights and biases to minimise a loss function, typically via gradient descent coupled with backpropagation to compute gradients. Modern architectures include convolutional networks (for images), recurrent networks (for sequences), and transformers (for sequences and many other modalities). The universal approximation theorem provides theoretical justification for why neural networks can in principle model any continuous function, though practical success depends on architecture, data, and optimisation.

Video

Discussed in:

Chapter 9: Neural Networks, 9.5 Network Architectures

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.