Glossary

Hinge Loss

The hinge loss for binary classification with $y \in \{-1, +1\}$ and prediction $\hat y \in \mathbb{R}$ is

$$L_\mathrm{hinge}(y, \hat y) = \max(0, 1 - y \hat y)$$

Zero when $y \hat y \geq 1$ (correct classification with margin $\geq 1$); otherwise grows linearly with the margin violation $1 - y \hat y$.

Hinge loss is the classification loss for the support vector machine. The SVM objective combines hinge loss on training examples with an L2 regulariser:

$$\min_w \frac{1}{N} \sum_n \max(0, 1 - y_n (w^\top x_n + b)) + \frac{\lambda}{2} \|w\|^2$$

This is a convex (but non-smooth) optimisation problem. Equivalent to the soft-margin SVM dual via the Lagrangian.

Properties:

  • Convex, global optimisation tractable.
  • Non-differentiable at the kink $y \hat y = 1$. Sub-gradient methods (or smoothed approximations) handle this.
  • Zero gradient when correct with margin, only margin-violating examples contribute. This produces sparse gradients and is the source of "support vectors".

Multiclass hinge loss (Crammer-Singer, 2001): for true class $y$ among $K$ classes with scores $\hat y_1, \ldots, \hat y_K$,

$$L = \sum_{k \neq y} \max(0, 1 - (\hat y_y - \hat y_k)) = \max(0, 1 - \hat y_y + \max_{k \neq y} \hat y_k)$$

Used in some structured prediction tasks and as an alternative to softmax cross-entropy.

Squared hinge loss $\max(0, 1 - y \hat y)^2$ is differentiable everywhere and gives a smoother optimisation landscape, sometimes preferred over standard hinge.

Comparison to logistic loss:

  • Logistic is smoother, gives calibrated probabilities, but never reaches zero gradient.
  • Hinge has zero gradient on confident correct predictions, leading to sparse support-vector solutions.

Hinge loss survives in modern AI in:

  • SVMs for tabular and text classification (still widely used).
  • Triplet loss (a generalisation): $\max(0, m + d(a, p) - d(a, n))$ for anchor-positive-negative triplets in metric learning.
  • Margin-based knowledge graph embeddings (TransE, RotatE).

Related terms: Support Vector Machine, SVM (mathematical detail), Cross-Entropy Loss, Triplet Loss

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).