Lagrangian and KKT, Glossary, Textbook of AI

For the constrained optimisation problem

$$\min_x f(x) \quad \text{s.t.} \quad g_i(x) \leq 0, \, h_j(x) = 0$$

the Lagrangian combines the objective and constraints:

$$\mathcal{L}(x, \lambda, \nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x)$$

with Lagrange multipliers $\lambda_i \geq 0$ (inequality constraints) and $\nu_j$ unrestricted (equality constraints).

The Lagrangian "absorbs" the constraints into the objective, the multipliers $\lambda_i, \nu_j$ are the "shadow prices" of the constraints. Their value at the optimum measures how much the optimal objective would change per unit relaxation of the corresponding constraint.

Karush–Kuhn–Tucker (KKT) conditions are necessary at an optimum $x^*$ (and sufficient for convex problems satisfying constraint qualifications):

Primal feasibility: $g_i(x^*) \leq 0$, $h_j(x^*) = 0$.
Dual feasibility: $\lambda_i^* \geq 0$.
Complementary slackness: $\lambda_i^* g_i(x^*) = 0$, at optimum, either the multiplier is zero or the constraint is active (binding) with $g_i(x^*) = 0$.
Stationarity: $\nabla_x \mathcal{L}(x^*, \lambda^*, \nu^*) = 0$, i.e.

$$\nabla f(x^*) + \sum_i \lambda_i^* \nabla g_i(x^*) + \sum_j \nu_j^* \nabla h_j(x^*) = 0$$

The KKT conditions reduce constrained optimisation to a system of equations and inequalities. For convex problems, they characterise global optima; for non-convex, only local optima.

Worked example: SVM.

Hard-margin SVM minimises $\frac{1}{2} \|w\|^2$ subject to $y_n(w^\top x_n + b) \geq 1$. Lagrangian:

$$\mathcal{L}(w, b, \alpha) = \frac{1}{2} \|w\|^2 - \sum_n \alpha_n [y_n(w^\top x_n + b) - 1]$$

Stationarity in $w$: $\nabla_w \mathcal{L} = w - \sum_n \alpha_n y_n x_n = 0 \Rightarrow w = \sum_n \alpha_n y_n x_n$.

Stationarity in $b$: $\sum_n \alpha_n y_n = 0$.

Substituting back gives the dual problem in terms of $\alpha$ alone. Complementary slackness $\alpha_n [y_n(w^\top x_n + b) - 1] = 0$ tells us exactly which points are support vectors (those with $\alpha_n > 0$, where the constraint is binding).

Constraint qualifications ensure the KKT conditions are necessary at the optimum:

Slater's condition (for convex problems): there exists a strictly feasible $x$ with $g_i(x) < 0$.
Linear independence constraint qualification (LICQ): gradients of active constraints are linearly independent.

Without constraint qualifications, KKT may fail at boundary points where the constraint geometry is degenerate.

Penalty and barrier methods convert constrained problems to unconstrained:

Quadratic penalty: $\min_x f(x) + \frac{\rho}{2} \sum_i \max(0, g_i(x))^2$, $\rho \to \infty$.
Logarithmic barrier: $\min_x f(x) - \frac{1}{t} \sum_i \log(-g_i(x))$, $t \to \infty$. Foundation of interior-point methods.
Augmented Lagrangian: combines penalty and Lagrangian for better numerical conditioning.

Related terms: Convex Optimisation, SVM (mathematical detail)

Discussed in:

Chapter 3: Calculus, Calculus

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).