Convex Optimisation, Glossary, Textbook of AI

A convex optimisation problem has the form

$$\min_x f(x) \quad \text{s.t.} \quad g_i(x) \leq 0, \, h_j(x) = 0$$

where $f$ and the $g_i$ are convex functions and the $h_j$ are affine. The feasible set $\{x : g_i(x) \leq 0, h_j(x) = 0\}$ is convex, and the objective is convex on it.

Why this matters: every local minimum is a global minimum. Convex problems can be solved efficiently and reliably, often in polynomial time, with strong theoretical guarantees.

Standard problem classes, in increasing generality:

Linear programming (LP): linear objective, linear constraints. Solved by simplex (worst-case exponential, fast in practice) or interior-point (provably polynomial).

Quadratic programming (QP): quadratic objective with positive semi-definite Hessian, linear constraints. SVMs, MPC, portfolio optimisation.

Second-order cone programming (SOCP): constraints of the form $\|Ax + b\| \leq c^\top x + d$. Robust optimisation, antenna design.

Semi-definite programming (SDP): matrix variable, linear objective, matrix-positive-semi-definite constraint. Relaxations of NP-hard combinatorial problems, control, quantum information.

General convex programming: arbitrary convex objective and constraints. Modern interior-point methods solve in $O(\sqrt{n} \log(1/\epsilon))$ Newton steps.

Lagrangian $\mathcal{L}(x, \lambda, \nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x)$ with $\lambda_i \geq 0$ (multipliers for inequalities) and unrestricted $\nu_j$ (for equalities).

Dual function: $g(\lambda, \nu) = \inf_x \mathcal{L}(x, \lambda, \nu)$. Always concave in $(\lambda, \nu)$ regardless of whether $f, g_i$ are convex.

Weak duality: $g(\lambda, \nu) \leq f(x^*)$ for any feasible $x$ and dual-feasible $(\lambda, \nu)$, the dual provides a lower bound on the primal optimum.

Strong duality: for convex problems with constraint qualifications (e.g. Slater's condition: a strictly feasible point exists), the duality gap is zero, primal and dual optima coincide.

KKT conditions characterise optima of convex problems: at $(x^*, \lambda^*, \nu^*)$,

Primal feasibility: $g_i(x^*) \leq 0$, $h_j(x^*) = 0$.
Dual feasibility: $\lambda_i^* \geq 0$.
Complementary slackness: $\lambda_i^* g_i(x^*) = 0$.
Stationarity: $\nabla f(x^*) + \sum_i \lambda_i^* \nabla g_i(x^*) + \sum_j \nu_j^* \nabla h_j(x^*) = 0$.

These are the foundation of constrained-optimisation theory and the solution method of SVMs, constrained quadratic programs, and many more.

Practical solvers: CVX/CVXPY (modelling languages that recognise convexity), MOSEK, Gurobi, ECOS, SCS, and many others. Modern ML practitioners typically write convex problems in CVXPY and let the solver handle the rest.

Disciplined Convex Programming (DCP): a methodology by Stephen Boyd's group at Stanford that imposes restrictions ensuring problem convexity is verifiable from the syntax, the basis of CVX/CVXPY.

Video

Discussed in:

Chapter 3: Calculus, Calculus

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).