Glossary

Linear Regression

Linear Regression is the cornerstone of predictive modelling and arguably the most important algorithm in all of statistics. It models a continuous target $y$ as a linear function of features: $y = \mathbf{w}^T \mathbf{x} + b$, where $\mathbf{w}$ is a weight vector and $b$ is a bias (intercept). The model assumes that the expected value of $y$ given $\mathbf{x}$ is a linear combination of the features—an assumption that is both its greatest strength (yielding interpretable, analytically tractable solutions) and its principal limitation.

Fitting is typically performed by Ordinary Least Squares (OLS), which minimises the sum of squared residuals: $L(\mathbf{w}) = \sum_i (y_i - \mathbf{w}^T \mathbf{x}_i - b)^2$. Because the loss surface is a convex quadratic, the minimum has a closed form: $\mathbf{w}^* = (X^T X)^{-1} X^T \mathbf{y}$—the normal equations. For large datasets, iterative methods such as gradient descent avoid forming the Gram matrix.

Under the Gauss–Markov assumptions (linearity, independence, homoscedasticity, zero-mean errors), OLS is the Best Linear Unbiased Estimator (BLUE). When errors are additionally Gaussian, MLE yields the same OLS solution, and exact inference via $t$ and $F$ distributions follows. Ridge regression adds L2 regularisation; Lasso adds L1 for feature selection; Elastic Net combines both. Polynomial features, basis expansions, and interaction terms extend linear regression to capture nonlinear relationships while retaining the computational advantages of linear algebra.

Related terms: Logistic Regression, Maximum Likelihood Estimation, Loss Function

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI, Textbook of Medical Statistics