1.8 What to take forward
Chapter 1 has been a vocabulary-building chapter. The rest of the textbook places technical machinery, linear algebra, probability, optimisation, neural networks, transformers, reinforcement learning, alignment, onto the conceptual scaffolding built here. Forward references in the takeaways below indicate where each idea is developed in detail.
The five concepts to carry forward
Five ideas from this chapter pay off most often when you meet new material and need to place it.
1. The rational-agent framework. An agent is anything that perceives its environment and chooses actions in order to achieve a goal. The standard formulation in modern AI, due to Russell and Norvig and adopted across the field, is to say that the agent chooses the action it expects will maximise its utility, that is, the action that, on average over the uncertainty it has about the world, will best advance the goal it has been given. This sounds abstract but it is concrete enough to programme: the rational agent has a percept (what it senses now), a memory of past percepts, a model of how the world responds to actions, and a way of scoring outcomes. We treat this formally throughout the book and especially in Chapter 15, where reinforcement learning gives the rational-agent picture mathematical teeth. Why it matters: it gives you a single mathematical scaffold for thinking about goal-directed AI without ever needing to argue about whether the agent is conscious, or whether it really understands. Those are interesting philosophical questions, but for engineering purposes the rational-agent picture lets you skip past them.
2. Symbolic and sub-symbolic representations. Symbolic AI represents knowledge as discrete things, rules, logical formulae, propositions, ontologies, parse trees. Sub-symbolic AI represents knowledge as continuous-valued patterns spread across many simple units, which today usually means the weights of a neural network. Both approaches have a long history; both work; both fail in characteristic ways. In this textbook the symbolic tradition appears chiefly in Chapter 1's history; the sub-symbolic tradition dominates Chapters 9 to 15, where neural networks, transformers, scaling, and modern foundation models are developed in detail. Why it matters: knowing which class to reach for on a given problem is half the battle. A small, well-defined body of knowledge with crisp rules, tax law, chemical safety constraints, chess endgames, is well served by symbolic methods. A problem with abundant data and patterns that are hard to formalise, vision, speech, natural language, protein folding, calls for sub-symbolic methods. Many real problems benefit from a hybrid in which the two approaches collaborate, and a great deal of practical AI engineering is about deciding where to draw that line.
3. The capability-evaluation co-evolution. Every time the field invents a benchmark to measure progress, the benchmark eventually gets saturated: a system reaches human-level or super-human performance and the benchmark stops being informative. The Turing test itself was the first instance of this pattern, and the pattern has played out repeatedly with chess, Go, ImageNet, GLUE, SuperGLUE, MMLU, and many others. The implication is that capability and evaluation move together: as systems get better, new evaluations are continuously needed to tell us how much better. Chapter 16 returns to this point in the context of alignment evaluations and robustness testing. Why it matters: it equips you to read leaderboard claims with appropriate scepticism. A press release that announces a record-breaking score on benchmark X tells you something only if you know how saturated X is, whether the test set was in the training data, what the variance is across runs, and whether the benchmark was even measuring what its name suggested.
4. The compute-data-algorithms triangle. Capability advances in modern AI trace to a combination of three resources: more compute, more data, and better algorithms. In recent years compute and data have dominated, with algorithmic improvements playing a smaller role than the headline narratives often suggest. Chapter 15 makes this quantitative through the scaling laws, which describe how model performance improves as a function of parameters, data, and compute. Why it matters: it explains why the frontier of AI capability sits inside a small number of well-capitalised industrial laboratories rather than in academic departments, and why open-weight models tend to lag the closed frontier in raw capability but close the gap over time as compute and data become cheaper. It also explains why predictions of future capability are largely predictions about future compute and data, a less glamorous framing than predictions about future algorithmic insight.
5. The jaggedness of capability. Modern AI systems are simultaneously superhuman on some tasks and subhuman on others, and the boundary between the two does not match human intuitions. A frontier language model in early 2026 might solve graduate-level mathematics problems while failing to count the number of letter-r's in "strawberry"; it might write a competent legal brief and then make an arithmetic error a primary school child would not. We treat this throughout the book, especially in Chapters 13 and 15, where the underlying reasons, what the model has been trained to do well, what it has not been trained on, where its representations break down, become clear. Why it matters: it prevents the cognitive error of treating "AGI" as a single binary threshold that systems either have or have not crossed. The jagged frontier means a system can be on the right side of the line for some tasks and on the wrong side for others at the same time, and engineering with such systems means knowing where each particular task sits.
Three habits to develop
Three habits will repay the time you spend on them.
1. Run code as you read. Every chapter from Chapter 6 onwards has worked code. Open a Python notebook, paste the code in, run it, and then modify it. Change a hyperparameter and see what happens. Break it deliberately and watch the failure mode. Replace a function with a different version and compare. The mathematics in the book becomes intuitive in roughly the time it takes you to make ten small modifications work; without the code, the equations remain symbols on a page. The goal is not to memorise the code but to develop the muscle memory of taking a printed algorithm and turning it into a thing that runs on your laptop, because that is how AI engineering is actually done.
2. Read the original papers. The textbook gives you the canonical results, organised pedagogically. The original papers tell you why each design choice was made, what the authors tried first that did not work, what assumptions they made, what ablations they ran. Citation lists in the chapters point to the must-reads. A good rule of thumb for a study session: read the chapter to its end, then for one of its results, find the original paper and read its abstract, introduction, and conclusion. Skip the experimental details on a first pass; come back to them if the result matters to your work. Over the course of the textbook this habit will leave you having read perhaps fifty foundational papers, which is enough to argue with conviction about most things in the field.
3. Track the field. AI moves fast, and a textbook printed in 2026 will be partly out of date by 2027. The defence is a sustained habit of tracking the field through living sources rather than rereading the textbook hoping it will update itself. Useful sources include the arXiv preprint server (filter by cs.LG for machine learning, cs.AI for artificial intelligence broadly, and cs.CL for computational linguistics); blog posts from frontier laboratories (OpenAI, Anthropic, Google DeepMind, Meta AI, DeepSeek); curated newsletters such as The Algorithmic Bridge, Last Week in AI, and Import AI; and conference proceedings from NeurIPS, ICML, ICLR, ACL, and EMNLP. The textbook is current as of early 2026; expect parts of Chapters 15 to 17 (scaling, modern systems, alignment) to age fastest and the mathematical foundations chapters (2 to 5) to age slowest.
A short FAQ
Q: Is this textbook a substitute for Russell and Norvig? No. Russell and Norvig's Artificial Intelligence: A Modern Approach is the canonical reference for the field and remains particularly strong on classical AI, search, logic, planning, knowledge representation, which this textbook covers only in historical context. Treat the present textbook as a complement to Russell and Norvig, not a replacement. If you intend to do research in AI you should own both and treat them as covering different sub-disciplines.
Q: Is this textbook a substitute for Goodfellow, Bengio, and Courville's Deep Learning? No, but the two overlap substantially. Goodfellow et al. (2016) goes deeper on the mathematical foundations of deep learning as it stood in the mid-2010s. The present textbook is broader, more current, and covers reinforcement learning, transformers, generative models beyond GANs, and the modern foundation-model paradigm in ways Goodfellow predates. If your interest is the mathematics of feedforward and convolutional networks at depth, read Goodfellow. If your interest is contemporary AI engineering, read this.
Q: Should I take a course or read this textbook first? Either works. Many graduate students read this kind of textbook before starting research; many courses use it as the assigned reading. The textbook does not require an instructor, every chapter is self-contained, with worked examples and exercises with solution sketches. Reading alongside a course gives you another voice on the material and someone to ask when you are stuck, which is helpful but not essential.
Q: Will this textbook be updated? Yes. The web edition is updated as the field evolves; the print edition is by definition a snapshot of the moment it went to press. If you are reading the web edition you can expect Chapters 15 to 17 to receive periodic refreshes, and the references and glossary to grow over time.
Q: Why "British English"? The author is based in New Zealand and uses British English by default. American English readers should expect "behaviour", "recognise", "generalise", "modelled", "centring", and "labelled" rather than the American spellings. The technical content is identical; the spelling differs. Code identifiers and library names remain in their canonical American form (color, optimizer, normalize) since that is how they appear in the underlying software.
Looking ahead: the five-chapter mathematical foundations
Chapter 2 (linear algebra), Chapter 3 (calculus and optimisation), Chapter 4 (probability and information theory), and Chapter 5 (statistics) are the language in which the rest of the book is written. They treat vectors and matrices, gradients and the chain rule, distributions and expectations, estimators and uncertainty, everything you need to read the equations in Chapters 6 to 17 without stopping to look up notation.
Treat these four chapters as a refresher rather than a first introduction. If you have not seen the material before, supplement with a dedicated textbook: Strang's Introduction to Linear Algebra for vectors, matrices, and decompositions; Spivak's Calculus for a rigorous treatment of differentiation and optimisation; and Bishop's Pattern Recognition and Machine Learning for probability and information theory in a machine-learning idiom. The chapters in this book are written compactly. A reader who is comfortable with university-level mathematics can read them as a "names and notation" reminder; a reader who needs a fuller treatment should plan to spend roughly twice as long on Chapters 2 to 5 as on the later chapters. The pace becomes more comfortable from Chapter 6 onwards, where each new idea is unpacked at greater length and the mathematics is in service of a concrete algorithm rather than a stand-alone topic.
What you should take away
- Intelligence is not one thing. The field's recurring disagreements, symbolic versus connectionist, logic versus statistics, narrow versus general, recur because the underlying phenomenon admits more than one approach, and the right tool depends on the task at hand.
- Modern AI is empirical. Progress has come less from theoretical breakthroughs than from the systematic application of a small set of techniques at increasing scale, which means an AI engineer needs the experimental sensibility of a scientist as much as the analytical sensibility of a mathematician.
- The rational-agent framework, the symbolic / sub-symbolic distinction, the capability-evaluation co-evolution, the compute-data-algorithms triangle, and the jaggedness of capability are the five conceptual handles that will let you place new material as you encounter it.
- Run code, read papers, and track the field. The textbook gives you the framework; these three habits give you the currency.
- The mathematical foundations chapters (2 to 5) are written compactly and should be read as a refresher; the technical chapters from 6 onwards are paced more gently, and Chapter 15 (modern AI, including reinforcement learning) and Chapters 9 to 14 (sub-symbolic methods, scaling, generative models) are where the conceptual scaffolding of Chapter 1 acquires its full mathematical weight.