Supervised Learning: 7.11   Multi-class and multi-label classification

Dr Chris Paton

7.11 Multi-class and multi-label classification

Many algorithms (logistic regression, naive Bayes, decision trees, random forests, neural networks) are intrinsically multi-class. SVMs and binary boosting need a wrapper.

One-vs-rest (OvR): train $K$ binary classifiers, the $k$-th distinguishing class $k$ from all others. Predict the class with the highest decision value. Simple, interpretable; $O(K)$ classifiers; can produce ambiguous regions.

One-vs-one (OvO): train $K(K-1)/2$ binary classifiers, one per class pair. Vote at prediction time. More classifiers, each on less data, competitive for small $K$.

Error-correcting output codes (ECOC): assign each class a binary codeword; train one classifier per bit; predict by Hamming-decoding the bit predictions. Robust to individual classifier errors.

Multi-label classification. Each example carries a set of labels.

Binary relevance: train one classifier per label. Ignores label correlations.
Classifier chains: chain the binary classifiers, each conditioned on previous predictions. Captures correlations; sensitive to ordering.
Label powerset: treat each unique label combination as a single class. Combinatorial explosion.

Metrics generalise: micro-F1 pools all examples and labels; macro-F1 averages per-label F1; subset accuracy demands every label correct; Hamming loss averages per-label error.