A separating line learns its place by minimising cross-entropy on labelled points.
From the chapter: Chapter 7: Supervised Learning
Glossary: logistic regression, cross entropy, decision boundary, sigmoid
Transcript
Logistic regression is the simplest classifier that learns a smooth probability rather than a hard label.
Here are two clouds of labelled points, red and blue, scattered in the plane.
The model proposes a straight line. Points on one side are predicted blue; points on the other are predicted red. A sigmoid function smooths the prediction, so points near the line get a probability close to one half.
The cost is cross-entropy: it punishes confident wrong predictions much more than uncertain ones.
Watch the line move under gradient descent. Each step nudges it to reduce the total cross-entropy across all points.
After a few iterations the line settles. The training set is mostly correctly classified, and the smooth probability surface tells you not just which class a point belongs to, but how confident the model is.
Logistic regression is a one-layer neural network. Stacking many of these with non-linearities is how deep classifiers are built.