Pairwise similarities in many dimensions become 2D positions that preserve neighbourhoods.
From the chapter: Chapter 8: Unsupervised Learning
Glossary: t sne, dimensionality reduction, umap
Transcript
Hundreds of points in fifty dimensions. We cannot see them, but they have structure: clusters, a hierarchy, maybe a manifold.
t-SNE compresses them to two dimensions while trying to preserve who is close to whom.
Step one: in the original space, compute pairwise similarities. For each point, ask which other points are its near neighbours. Use a Gaussian-like kernel.
Step two: place the points randomly in the plane. Compute pairwise similarities there too, with a heavy-tailed Student-t kernel.
Step three: minimise the difference between the two similarity distributions. Move each 2D point so its neighbours in the plane match its neighbours in the original space.
Run gradient descent. Watch the random scatter unfold. Clusters that were tight in 50D pull together in 2D. Distant clusters drift apart.
The result is a map. Local distances are meaningful. Global distances are not.
Unlike PCA, t-SNE is non-linear and stochastic. Different random seeds give different layouts. The shapes of clusters can be misleading. But for revealing structure that linear methods miss, it is the standard.
UMAP is a cousin that runs faster and tries to preserve more global structure.