Chapter Fifteen

Modern AI

Learning Objectives
  1. Derive the Kaplan and Chinchilla scaling laws and use them to choose compute-optimal model size and training data
  2. Critically appraise claims of emergent abilities, distinguishing genuine phase transitions from metric artefacts
  3. Describe the modern pre-training recipe, data curation, tokenisation, training stack, curriculum, for a frontier language model
  4. Derive the RLHF objective from the Bradley–Terry preference model and explain the role of the KL penalty in PPO fine-tuning
  5. Derive Direct Preference Optimisation from the closed-form RLHF optimum and compare it with IPO, KTO, ORPO and SimPO
  6. Explain GRPO and how reasoning models such as DeepSeek-R1 are trained on verifiable rewards
  7. Use test-time compute, best-of-$N$, self-consistency, tree search, thinking tokens, to trade inference cost for accuracy
  8. Distinguish process reward models from outcome reward models and explain the result of Lightman et al. (2023)
  9. Implement retrieval-augmented generation, tool use and agentic loops, and reason about their failure modes
  10. Describe the multimodal frontier, vision–language, audio, video, embodied, and the state of evaluation in 2026

The decade from 2015 to 2025 was, in retrospect, the decade in which artificial intelligence stopped being a discipline mostly concerned with research benchmarks and became a piece of infrastructure that ran the world's writing, coding, customer support and an increasing share of its scientific reasoning. The Transformer arrived in 2017, GPT-3 in 2020, ChatGPT in 2022, GPT-4 in 2023, the first usable reasoning models in 2024, and by 2026 the frontier looked very different from anything that had preceded it. This chapter is a snapshot of where that journey reached as of April 2026, written from the vantage of someone who needs both to use these systems and to understand how they work.

The earlier chapters of this book covered the scaffolding: linear algebra, probability, optimisation, classical machine learning, neural networks, the Transformer. This chapter is concerned with what happens when you apply that scaffolding at the largest scale that humanity has ever pointed at a single model class. We start with the empirical scaling laws that governed the era. We move through the pre-training recipe, the alignment recipe, and the reasoning recipe. We discuss test-time compute (the suddenly-central idea that you can spend money at inference rather than training time). We cover tools, agents and retrieval. We end with a survey of the frontier as of early 2026 and an end-to-end recipe that any reader can run on a single GPU.

A note on style. Modern AI is a fast-moving field, and any chapter written about it is partly a hostage to fortune. We have tried to focus on the equations, the qualitative findings and the design principles, things that we expect to outlive the specific model names. Where we name a system, we name it because the design choice it embodies is instructive, not because that particular system is the latest.

In this chapter

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.