An Embedding is a mapping from discrete entities—words, users, products, molecules—into continuous, dense, low-dimensional vector spaces. Raw discrete data cannot be manipulated directly by numerical algorithms; a word is an arbitrary string, a user is an opaque ID. Embeddings solve this by assigning each entity a vector (typically of 32 to 1024 dimensions) such that semantically similar entities have nearby vectors.
Embeddings were popularised by Word2Vec (Mikolov et al., 2013), which trained a shallow neural network to predict words from their contexts. The learned vectors exhibited remarkable algebraic structure: $\text{king} - \text{man} + \text{woman} \approx \text{queen}$, showing that semantic relationships become geometric directions in the embedding space. GloVe achieved similar results by factorising a global word co-occurrence matrix. Contextual embeddings from models like BERT and GPT assign different vectors to the same word depending on its context, resolving ambiguities that static embeddings cannot.
Embeddings now pervade AI: users and items are co-embedded in recommendation systems, images are embedded for visual search, proteins are embedded for drug discovery, and CLIP embeds images and text into a shared space enabling zero-shot cross-modal retrieval. The geometry of an embedding space determines everything downstream, so embeddings are typically learned as part of a larger model via contrastive or supervised objectives. Because embeddings inherit biases from their training data, responsible use requires auditing and debiasing techniques.
Discussed in:
- Chapter 2: Linear Algebra — Embeddings
- Chapter 12: Sequence Models — Word Embeddings
Also defined in: Textbook of AI