Plot training and validation error against dataset size. The shapes reveal what's wrong.
From the chapter: Chapter 6: ML Fundamentals
Glossary: learning curve, overfitting, underfitting
Transcript
A learning curve plots model error on the y-axis, dataset size on the x-axis. Two curves: training error and validation error.
Healthy fit. With small data, validation error is high and training error is low. As data grows, training error rises slightly, validation error falls. They meet at a small gap. The model has converged to its best.
Underfit. Both curves plateau at a high error and stay high. More data does not help. The model is too simple to capture the pattern. The fix is a more flexible model.
Overfit. Training error stays low, often near zero. Validation error stays much higher, with a wide gap that never closes. The model memorises rather than generalises. The fix is more data, or simpler model, or stronger regularisation.
The shape of the curves tells you which problem you have. Plateau gap means underfit, persistent gap means overfit, both gradually closing means you are in the right zone.
Practitioners use learning curves before scaling up training. If a smaller dataset already shows overfitting, more compute will not help. If it shows underfitting, more compute will not help either. The shape decides the strategy.