Glossary

Bias (Fairness)

Bias in machine learning refers to systematic errors that disadvantage particular groups of people, typically defined along axes such as race, gender, age, disability, or socioeconomic status. These biases are rarely introduced by malicious intent; rather, they arise from the data on which models are trained, the features selected for prediction, the choice of objective function, and the social context of deployment. A hiring model trained on historical promotion decisions will faithfully learn any gender discrimination embedded in those records; a recidivism tool trained on arrest data will reflect policing practices.

Formalising fairness turns out to be surprisingly difficult because multiple intuitive notions are provably incompatible in most non-trivial settings. Three widely studied criteria are demographic parity (equal positive prediction rates across groups), equalised odds (equal true-positive and false-positive rates across groups), and calibration (equal actual positive rates among individuals assigned a given predicted probability). The impossibility theorem of Chouldechova (2017) shows that calibration and equalised odds cannot be simultaneously satisfied when base rates differ between groups.

Technical interventions operate at three stages. Pre-processing methods reweight or rebalance the training data or learn fair representations. In-processing methods modify the learning algorithm with fairness constraints or regularisation terms. Post-processing approaches adjust model outputs, for example by applying group-specific thresholds. No single approach is universally superior, and fairness is ultimately not solely a technical problem: the choice of which attributes to protect, which metric to optimise, and what disparity is acceptable are normative questions requiring input from affected communities, domain experts, and policymakers.

Related terms: Explainable AI

Discussed in:

Also defined in: Textbook of AI