The chain rule on a computation graph, Textbook of AI

Gradients multiply backward along the edges of a small graph from output to input.

From the chapter: Chapter 3: Calculus

Glossary: chain rule, computation graph, backpropagation

Transcript

The chain rule turns a derivative through a chain of functions into a product of local derivatives.

Here is a small computation graph. The input x flows through three operations: first a square, then a sine, then an exponential. The output is L.

To find dL by dx, the chain rule says: multiply the local derivatives along the path from x to L.

Each edge on the graph carries one local derivative. Square contributes two x. Sine contributes cosine of its input. Exponential contributes itself.

Pick a value for x, say one. The forward pass fills in each node with its value. The output L lights up.

Now run the backward pass. Start with dL by dL equal to one at the output. Multiply by each local derivative as you walk backward along the edges.

Watch the gradient propagate. By the time it reaches x, you have the full derivative.

This is the entire idea behind backpropagation. A neural network is just a much larger computation graph, and gradients flow back through it the same way.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).