Supervised Learning: 7.14   Comparative experiment

Dr Chris Paton

7.14 Comparative experiment

A clean comparison on the breast-cancer dataset, with a stratified train/test split and a fixed seed, illustrates the relative strengths of these methods.

import numpy as np, pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score

X, y = load_breast_cancer(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, stratify=y, test_size=0.25, random_state=0)

models = {
    "LogReg":   Pipeline([("s", StandardScaler()), ("m", LogisticRegression(max_iter=5000))]),
    "KNN":      Pipeline([("s", StandardScaler()), ("m", KNeighborsClassifier(n_neighbors=7))]),
    "Tree":     DecisionTreeClassifier(max_depth=5, random_state=0),
    "RF":       RandomForestClassifier(n_estimators=300, n_jobs=-1, random_state=0),
    "GBM":      GradientBoostingClassifier(n_estimators=300, max_depth=3, random_state=0),
    "SVM-RBF":  Pipeline([("s", StandardScaler()), ("m", SVC(probability=True, random_state=0))]),
    "GaussNB":  GaussianNB(),
}
rows = []
for name, m in models.items():
    m.fit(Xtr, ytr)
    pred = m.predict(Xte)
    proba = m.predict_proba(Xte)[:, 1] if hasattr(m, "predict_proba") else m.decision_function(Xte)
    rows.append({"model": name,
                 "accuracy": accuracy_score(yte, pred),
                 "f1": f1_score(yte, pred),
                 "auc": roc_auc_score(yte, proba)})
print(pd.DataFrame(rows).round(3).to_string(index=False))

Typical numbers (your seed will vary by a few hundredths): all of LogReg, RF, GBM, and SVM-RBF land in the 0.97–0.99 AUC range; the single tree is a step down at $\sim 0.94$; KNN and Gaussian NB fall in between. The lesson is that for a small, well-scaled tabular dataset, every reasonable method gets close to ceiling, choice of algorithm matters less than data quality, careful preprocessing, and sound evaluation.