The logistic curve maps any number to a probability, Textbook of AI

A linear score, squashed by a sigmoid, becomes a probability between zero and one.

From the chapter: Chapter 7: Supervised Learning

Glossary: logistic regression, sigmoid

Transcript

Linear regression outputs any real number. For classification, we need probabilities, between zero and one.

Logistic regression solves this. Compute a linear score: weights times features, plus a bias. Then squash through the sigmoid.

The sigmoid, also called the logistic function. One over one plus exponential of minus z.

For very negative scores, the sigmoid is near zero. The model is confident: class zero.

For very positive scores, the sigmoid is near one. Confident: class one.

Around zero, the curve is nearly linear, transitioning smoothly. Score zero, probability fifty-fifty. The decision boundary.

Plot the data along one feature. Two clouds of red and blue. Fit logistic regression. The fitted sigmoid rises smoothly from zero on the red side to one on the blue.

To choose a class label, threshold the probability at 0.5. The threshold corresponds to the linear score crossing zero.

Train by maximising the log-likelihood, equivalently minimising the cross-entropy loss. The optimisation is convex. A unique global minimum. No local optima, no fiddling with initialisations.

Logistic regression is the workhorse of binary classification, the building block of softmax classifiers, and the output layer of countless neural networks.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).