Tensor, Glossary, Textbook of AI

A Tensor, in the loose sense used in deep learning, is a multi-dimensional array of numbers. The number of axes is called the rank or order of the tensor:

A scalar is a rank-0 tensor: a single number.
A vector is a rank-1 tensor: an ordered list of numbers, $\mathbf{v} \in \mathbb{R}^n$.
A matrix is a rank-2 tensor: a rectangular grid, $\mathbf{M} \in \mathbb{R}^{m \times n}$.
A rank-3 tensor might represent, for example, an RGB image: $\mathbf{T} \in \mathbb{R}^{H \times W \times 3}$.
A batch of colour images is a rank-4 tensor: $\mathbf{T} \in \mathbb{R}^{B \times C \times H \times W}$ (PyTorch convention) or $\mathbb{R}^{B \times H \times W \times C}$ (TensorFlow's "channels-last").
A batch of video clips is rank-5: $\mathbf{T} \in \mathbb{R}^{B \times T \times C \times H \times W}$.
The activations inside a Transformer are rank-3: $\mathbf{T} \in \mathbb{R}^{B \times L \times D}$ (batch, sequence length, hidden dimension).

Operations

Tensor algebra generalises matrix algebra. The core operations are:

Element-wise arithmetic, addition, multiplication, etc., applied component-by-component to tensors of the same shape.
Broadcasting, extending element-wise operations to tensors of compatible but unequal shapes, enabling concise code such as adding a bias vector $\mathbf{b} \in \mathbb{R}^D$ to every row of a matrix $\mathbf{X} \in \mathbb{R}^{B \times D}$.
Reduction, collapsing one or more axes via sum, mean, max, etc., e.g. computing per-example loss from a batch of per-token losses.
Reshape, permute, transpose, rearranging the layout of an array without changing its contents.
Contraction, generalised matrix multiplication, expressed concisely with Einstein summation notation: einsum("bij,bjk->bik", A, B) performs a batched matrix product.
Slicing and indexing, extracting sub-tensors.

Frameworks and hardware

Modern deep-learning frameworks, PyTorch, TensorFlow, JAX, MLX, are built around the efficient manipulation of tensors. Each provides a tensor object that lives on either CPU or GPU/TPU memory, supports automatic differentiation, and dispatches operations to hand-tuned linear-algebra kernels (cuBLAS, cuDNN, MPS). The same tensor abstraction allows the same Python code to run on a laptop, a multi-GPU server, or a TPU pod, with the framework handling device placement and parallelism.

The primacy of the tensor in deep-learning hardware is no accident. Modern accelerators, NVIDIA Tensor Cores, Google TPUs, Apple Neural Engine , are essentially specialised matrix-multiply units, and the entire deep-learning stack is co-designed around the assumption that the dominant computation is dense, structured tensor contraction.

Mathematical pedantry

In strict mathematical usage, a tensor is a more sophisticated object: a multi-linear map between vector spaces, with well-defined transformation rules under changes of coordinates, central to physics (general relativity, continuum mechanics) and differential geometry. Deep learning's use of the term is informal and refers only to the data structure, not the geometric object. The distinction rarely causes practical confusion, but it is worth knowing that mathematicians and physicists use the word more precisely than deep-learning practitioners do; the framework name "TensorFlow" gives the impression of mathematical rigour that, in this technical sense, it does not strictly possess.

Related terms: Vector, Matrix, Chain Rule, GPU Acceleration

Discussed in:

Chapter 4: Probability, Tensors and the machinery of deep learning

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.