Matrix Multiplication combines two matrices to produce a third. If $A$ is $m \times n$ and $B$ is $n \times p$, their product $AB$ is an $m \times p$ matrix whose $(i,j)$ entry is the dot product of the $i$-th row of $A$ with the $j$-th column of $B$. The inner dimensions must match; the result inherits the outer dimensions. Matrix multiplication is associative, $(AB)C = A(BC)$, but not commutative: in general $AB \neq BA$.
The non-commutativity is not a technicality but a deep fact: matrix multiplication corresponds to the composition of linear transformations, and the order in which transformations are applied matters. Multiplying by $B$ then by $A$ is different from multiplying by $A$ then by $B$, just as rotating a book then opening it is different from opening it then rotating it.
In neural networks, a single forward pass through a layer computes $\mathbf{y} = f(W\mathbf{x} + \mathbf{b})$, and the matrix multiplication $W\mathbf{x}$ is the computational bottleneck. Modern GPUs and TPUs are designed specifically to perform enormous matrix multiplications efficiently through massive parallelism. The entire deep learning revolution can be seen as the consequence of making matrix multiplication fast enough to train models with billions of parameters. Innovations such as Flash Attention further optimise how attention's many matrix multiplications interact with GPU memory hierarchies.
Related terms: Matrix, Dot Product, Neural Network
Discussed in:
- Chapter 2: Linear Algebra — Matrix Operations
Also defined in: Textbook of AI