Abstract. Introduces Simple Preference Optimisation (SimPO), a DPO-style objective that does not require a reference policy. SimPO replaces the implicit-reward $\log \pi_\theta(y \mid x) - \log \pi_\text{ref}(y \mid x)$ with the length-normalised average log-likelihood under the target policy alone, removing the second forward pass and freeing the GPU memory occupied by the frozen reference model. Also adds a margin parameter to the Bradley-Terry loss. SimPO matches or exceeds DPO on AlpacaEval, ArenaHard and MT-Bench at substantially lower training cost, and is part of the post-DPO simplification trend in preference learning.