Underfitting is high bias, overfitting is high variance. The best model balances the two.
From the chapter: Chapter 6: ML Fundamentals
Glossary: bias variance tradeoff, overfitting, underfitting
Transcript
We have a training set drawn from some unknown function. Plus a little noise.
Fit a constant to it. The model ignores the data entirely. Bias is huge. Variance is zero. The same flat line every time.
Fit a degree-one polynomial. A straight line. Lower bias, still some variance.
Fit a degree-three polynomial. A gentle curve. We are getting close.
Now fit a degree-twenty polynomial to thirty points. The curve wiggles wildly to pass through every training point. Zero training error.
But shake the dataset, redraw the noise, and the wiggling polynomial transforms violently. High variance. Tiny changes in the data produce wildly different fits.
Plot expected test error against model complexity. On the left, error is high because the model is too simple, all bias. On the right, error is high because the model is too flexible, all variance.
The sweet spot is in the middle, where bias and variance roughly balance.
This U-shaped curve is one of the few universal facts about supervised learning. Cross-validation, regularisation, and early stopping all exist to find that valley.