Vector, Glossary, Textbook of AI

A Vector is, at its most elementary, an ordered list of numbers. We write a vector $\mathbf{v}$ in $n$-dimensional space as

$$\mathbf{v} = (v_1, v_2, \ldots, v_n) \in \mathbb{R}^n,$$

where each $v_i$ is called a component or coordinate. In two dimensions a vector can be drawn as an arrow from the origin to the point $(v_1, v_2)$; in three dimensions, an arrow into the page-and-up-and-right of physical space. Beyond three dimensions geometric intuition fails, but the algebra generalises without difficulty, and modern AI systems routinely operate in spaces with thousands or even millions of dimensions: a single token's embedding in a large language model lives in $\mathbb{R}^{4096}$ or higher, and the parameter vector of a frontier model lives in $\mathbb{R}^{10^{11}}$ or beyond.

Operations

Vectors support two fundamental operations:

Addition, performed component-wise: $(\mathbf{u} + \mathbf{v})_i = u_i + v_i$.
Scalar multiplication: $(c\,\mathbf{v})_i = c\,v_i$.

These operations, subject to a handful of axioms (associativity, commutativity, distributivity, the existence of a zero vector and additive inverses), define a vector space. Within a vector space, additional operations of central importance to machine learning include:

The dot product (or inner product): $\mathbf{u} \cdot \mathbf{v} = \sum_i u_i v_i = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta$, which measures alignment.
The norm, measuring length. The most common is the Euclidean ($L_2$) norm: $\|\mathbf{v}\|_2 = \sqrt{v_1^2 + \cdots + v_n^2}$. Other norms include the $L_1$ (sum of absolute values, used in lasso regression) and $L_\infty$ (maximum absolute component).
Normalisation: dividing by the norm to produce a unit vector $\hat{\mathbf{v}} = \mathbf{v}/\|\mathbf{v}\|$.
Cosine similarity: $\cos\theta = (\mathbf{u} \cdot \mathbf{v})/(\|\mathbf{u}\| \|\mathbf{v}\|)$, the angle-based similarity that underlies most embedding-based retrieval.

Vectors in AI

In AI, a vector typically represents a data point or a collection of features. Concrete examples:

An image flattened into a vector of pixel values, $\mathbf{x} \in \mathbb{R}^{H W C}$.
A document encoded as a bag-of-words vector, with each component the frequency of one word.
A user encoded as a vector of interaction features for a recommender system.
A word, sentence, or image encoded as a learned embedding, in which semantic similarity corresponds to small Euclidean or cosine distance.
The gradient of a loss with respect to parameters, the vector that drives learning.
The policy gradient of a reinforcement-learning agent, pointing in the direction of increasing expected reward.

The power of vector representations is that they import the full machinery of linear algebra, distances, angles, projections, transformations, decompositions, to data that is, on its surface, not numerical at all. The Word2Vec discovery that semantic relationships could be captured as vector arithmetic ("king" $-$ "man" $+$ "woman" $\approx$ "queen") drew wide attention partly because it made tangible just how much structure can hide in a learned vector representation.

Learning to see data, text, images, behaviour, molecules, as vectors in some high-dimensional space is the first step toward understanding how modern AI systems work.

Video

Related terms: Tensor, Matrix, Embedding, Dot Product

Discussed in:

Chapter 4: Probability, Vectors, matrices and tensors

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.