7.14 Comparative experiment
A clean comparison on the breast-cancer dataset, with a stratified train/test split and a fixed seed, illustrates the relative strengths of these methods.
import numpy as np, pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score
X, y = load_breast_cancer(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, stratify=y, test_size=0.25, random_state=0)
models = {
"LogReg": Pipeline([("s", StandardScaler()), ("m", LogisticRegression(max_iter=5000))]),
"KNN": Pipeline([("s", StandardScaler()), ("m", KNeighborsClassifier(n_neighbors=7))]),
"Tree": DecisionTreeClassifier(max_depth=5, random_state=0),
"RF": RandomForestClassifier(n_estimators=300, n_jobs=-1, random_state=0),
"GBM": GradientBoostingClassifier(n_estimators=300, max_depth=3, random_state=0),
"SVM-RBF": Pipeline([("s", StandardScaler()), ("m", SVC(probability=True, random_state=0))]),
"GaussNB": GaussianNB(),
}
rows = []
for name, m in models.items():
m.fit(Xtr, ytr)
pred = m.predict(Xte)
proba = m.predict_proba(Xte)[:, 1] if hasattr(m, "predict_proba") else m.decision_function(Xte)
rows.append({"model": name,
"accuracy": accuracy_score(yte, pred),
"f1": f1_score(yte, pred),
"auc": roc_auc_score(yte, proba)})
print(pd.DataFrame(rows).round(3).to_string(index=False))
Typical numbers (your seed will vary by a few hundredths): all of LogReg, RF, GBM, and SVM-RBF land in the 0.97–0.99 AUC range; the single tree is a step down at $\sim 0.94$; KNN and Gaussian NB fall in between. The lesson is that for a small, well-scaled tabular dataset, every reasonable method gets close to ceiling, choice of algorithm matters less than data quality, careful preprocessing, and sound evaluation.