Drop a perpendicular from one vector to another to find its shadow, the dot product divided by length.
From the chapter: Chapter 2: Linear Algebra
Glossary: dot product, vector space
Transcript
Two vectors share an origin. Call them a and b.
To project a onto b, we ask: how much of a points in the direction of b.
Drop a perpendicular from the tip of a down to the line through b. The foot of that perpendicular marks the projection.
The length of that shadow is a dot b divided by the length of b. The shadow itself is that scalar times the unit vector along b.
When a and b point the same way, the shadow equals a's full length. When they are perpendicular, the shadow has zero length, the dot product is zero.
When a points opposite to b, the shadow is negative.
This single picture sits behind enormous amounts of machine learning. Gradient steps project onto search directions. Attention scores are dot products that measure alignment between queries and keys. Cosine similarity is the projection normalised to unit length.
Even the least squares solution can be read as: project the target vector onto the subspace your model can express. The piece that makes it, the prediction. The piece that doesn't, the residual.