Embeddings APIs are hosted endpoints that take a string and return a vector $\mathbf{v} \in \mathbb{R}^d$ such that semantically similar inputs have small cosine distance. They are the front door to modern retrieval and form the substrate of RAG, agent memory, recommendation, deduplication, classification, and anomaly detection.
Production landscape (2025)
| Provider | Model | Dim | Notes |
|---|---|---|---|
| OpenAI | text-embedding-3-small |
1536 (Matryoshka, truncatable) | Cheap, strong default |
| OpenAI | text-embedding-3-large |
3072 | Best closed model on MTEB until 2024 |
| Cohere | embed-v3 |
1024 | Multilingual, strong rerank pairing |
| Voyage AI | voyage-3 / voyage-code-3 |
1024–2048 | Domain-specialised, top of MTEB |
text-embedding-004 |
768 | Strong on Gemini stack | |
| Anthropic | (uses Voyage) | , | Rebrand of Voyage; no native embeddings |
| BGE (BAAI) | bge-m3, bge-large-en-v1.5 |
1024 | Strongest open model, multilingual |
| E5 (Microsoft) | e5-mistral-7b-instruct |
4096 | LLM-based open model |
| Jina | jina-embeddings-v3 |
1024 | 8k context, multilingual |
| Nomic | nomic-embed-text-v1.5 |
768 | Fully open, audited |
Matryoshka embeddings
Modern models (OpenAI 3.x, Nomic, Voyage) train with Matryoshka representation learning: the first $k$ dimensions of a $d$-dim vector are themselves a valid embedding of dimension $k$. This lets developers truncate at index time:
full = openai.embeddings.create(model="text-embedding-3-large", input=text).data[0].embedding
fast = full[:512] # still semantically meaningful
Trades 80% storage savings for ~2% recall.
Training objective
Embeddings models are typically bi-encoders trained with contrastive loss:
$$\mathcal{L} = -\log \frac{\exp(\text{sim}(q, d^+)/\tau)}{\exp(\text{sim}(q, d^+)/\tau) + \sum_{d^-} \exp(\text{sim}(q, d^-)/\tau)}$$
where $(q, d^+)$ is a positive query/document pair and $d^-$ are negatives. The temperature $\tau$ controls sharpness.
Hard-negative mining and large in-batch negatives are the dominant tricks for SOTA performance.
MTEB
The Massive Text Embedding Benchmark (Muennighoff et al. 2022) is the canonical leaderboard: 56+ tasks covering retrieval, classification, clustering, STS, summarisation, and re-ranking. The MTEB leaderboard is updated daily.
Choice guidance (2025)
- Default closed: OpenAI
text-embedding-3-smallfor cost,text-embedding-3-largefor quality. - Open-source winner:
bge-m3for multilingual + multifunctional;e5-mistral-7b-instructfor max quality. - Code:
voyage-code-3orjina-code-v2. - Multilingual:
bge-m3or Cohereembed-multilingual-v3.
Production gotchas
- Distribution shift, model upgrades change the vector space; you must re-index.
- Domain mismatch, generic embeddings underperform on legal, biomedical, or code; consider domain models.
- Asymmetric retrieval, some models need different prefixes for queries vs documents (
query:vspassage:for E5). - Quantisation, int8 or binary quantisation cuts storage 4–32× with small recall loss; supported by all major DBs.
Related terms: Vector Database, Retrieval-Augmented Generation, Re-Ranking, Memory and Context Management
Discussed in:
- Chapter 15: Modern AI, Modern AI