Embedding, Glossary, Textbook of AI

An Embedding is a mapping from discrete entities, words, users, products, molecules, into continuous, dense, low-dimensional vector spaces. Raw discrete data cannot be manipulated directly by numerical algorithms; a word is an arbitrary string, a user is an opaque ID. Embeddings solve this by assigning each entity a vector (typically of 32 to 1024 dimensions) such that semantically similar entities have nearby vectors.

Embeddings were popularised by Word2Vec (Mikolov et al., 2013), which trained a shallow neural network to predict words from their contexts. The learned vectors exhibited regular algebraic structure: $\text{king} - \text{man} + \text{woman} \approx \text{queen}$, showing that semantic relationships become geometric directions in the embedding space. GloVe achieved similar results by factorising a global word co-occurrence matrix. Contextual embeddings from models like BERT and GPT assign different vectors to the same word depending on its context, resolving ambiguities that static embeddings cannot.

Embeddings now pervade AI: users and items are co-embedded in recommendation systems, images are embedded for visual search, proteins are embedded for drug discovery, and CLIP embeds images and text into a shared space enabling zero-shot cross-modal retrieval. The geometry of an embedding space determines everything downstream, so embeddings are typically learned as part of a larger model via contrastive or supervised objectives. Because embeddings inherit biases from their training data, responsible use requires auditing and debiasing techniques.

Interactive

t-SNE unfolds high-dimensional clusters into 2D. Pairwise similarities in many dimensions become 2D positions that preserve neighbourhoods.

Video

Related terms: Word2Vec, Vector, CLIP, BERT

Discussed in:

Chapter 2: Linear Algebra, Embeddings
Chapter 12: Sequence Models, Word Embeddings

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.