1.7 Why now, why this book

Why is now such a moment for AI? The answer has three parts. The first is capability: current systems do things no system in 2010 could approach, and the list is long enough that the cumulative effect is qualitative rather than quantitative. The second is deployment: the friction of putting a working AI system into a working product has fallen by something like three orders of magnitude in five years, so capabilities that once lived only in research labs now live behind login screens used by hundreds of millions of people. The third is trajectory: capabilities are still growing, and on most public benchmarks the slope has yet to flatten. Take any one of the three away and the moment becomes ordinary; it is the simultaneity that makes 2026 feel different from 1986 or 2006.

The capability story

Consider what a personal computer in 2010 could do for an ordinary user, and what it could not. It could spell-check English text, recognise a typewritten document on a flat-bed scanner, and translate single words from a dictionary. It could play chess at superhuman level if you launched the right program, and it could recognise the spoken phrase "call mom" if you trained it for half a minute on your own voice. What it could not do, in any reliable form, would fill a long list. Translate a paragraph of Korean into English at the quality of a working translator. Write a fluent five-paragraph essay on a topic of your choice. Identify the species of a bird from a phone photograph. Recognise a caller's voice in a noisy café. Predict the three-dimensional shape of a protein from its amino-acid sequence. Beat a human professional at Go. Pass a university examination in introductory chemistry. Generate a thirty-second photorealistic video from a sentence of description. Solve a real GitHub bug report by editing a real codebase.

By 2026 every item on that second list is routine, and most are ordinary consumer features. Translation between any pair of major languages is now at near-professional quality on free websites. Generation of fluent, context-sensitive prose, working code, and high-quality images is available in the chat box of every major productivity application. Speech recognition in noisy real-world audio is the default for dictation and for the captions that scroll across a video call. AlphaFold 2, and then AlphaFold 3, predict protein structure to within a few Ångströms of crystallography for a substantial fraction of all proteins for which structures matter; Hassabis and Jumper jointly received the 2024 Nobel Prize in Chemistry for AlphaFold. AlphaGo and its successors mastered Go in 2016; mastery of the harder games of imperfect information, StarCraft, Diplomacy, followed in 2019 and 2022. Frontier language models now solve a substantial fraction of grade-school to graduate-school examination questions in mathematics, physics, biology and law. Sora and Veo generate photorealistic video in real time from a text prompt. Coding agents close more than eighty-five per cent of issues in the SWE-Bench benchmark drawn from real Python repositories (Claude Opus 4.7 87.6 per cent, GPT-5.5 ~83 per cent), a benchmark on which an early-2024 system scored under five per cent.

None of these were on the realistic horizon in 2010. The polite consensus among AI researchers at the time was that machine translation would remain post-edited rather than fluent for the foreseeable future, that protein folding was a thirty-year problem, and that Go was at least a decade away. All three estimates were wrong by an order of magnitude. The cumulative effect of the surprises is what matters most. Not any single capability, but their number and their breadth, the fact that they reach across language, vision, audio, code, mathematics, science and games, is what makes contemporary AI a different kind of object from contemporary chess engines or contemporary search engines.

The economists' name for an object of this kind is a general-purpose technology (David 1990; Bresnahan and Trajtenberg 1995). General-purpose technologies are not just useful; they reshape the economy and the structure of work over decades, by inducing complementary innovations across industries that would otherwise have nothing to do with each other. The canonical examples are the steam engine, electricity, the internal combustion engine, and the integrated circuit. Each of those technologies took thirty to fifty years to express its full economic effect, because the productivity gains depended on rearrangements of work, capital and skill that could not happen overnight. The point of calling AI a general-purpose technology is not to predict the size of those gains; it is to set the right time-scale for thinking about the change. Months are too short. Decades are about right.

The deployment story

A capability that lives only in a research paper, or only in the demos given at a major conference, is a research curiosity. The transition from curiosity to ordinary feature requires that the capability be deployable, that an engineer who is not the inventor can put it to work in a product on a budget. This is a separate problem from the capability itself, and the unlock that occurred between roughly 2018 and 2024 is, in some ways, more consequential than the capability story.

Four things changed at once.

  • Foundation models can be adapted to a new task with prompting alone, with a handful of examples (in-context learning), or with fine-tuning on a few thousand examples. This replaces the task-specific dataset, task-specific model, task-specific training pipeline that dominated machine learning before 2018. A working foundation model is the substrate on which a thousand smaller systems are built.
  • Standardised APIs, the OpenAI Chat Completions API, the Anthropic Messages API, and the Google and DeepSeek equivalents, mean that any developer can integrate a frontier model into an application with a few lines of HTTP code. The integration looks the same whether the application is a customer-support chat, a medical-records summariser, or a children's homework tutor.
  • Cloud GPUs mean that a team that wants to train, fine-tune, or serve a model does not need to buy hardware. AWS, Azure, Google Cloud, and the specialised neoclouds (CoreWeave, Lambda, Crusoe) rent the same H100s, B200s, and TPUs that the frontier labs use. The capital expenditure that would have been a deal-breaker for a small team becomes an operating expenditure they can scale up and down by the hour.
  • Open-weight models, Llama, Qwen, DeepSeek, Mistral, Gemma, mean that on-premises deployment is feasible for organisations that cannot send data outside their own walls. Hospitals, banks, and defence ministries that would never use a hosted API can run a fourteen- or seventy-billion-parameter model on their own GPUs in a back room.

Put the four together and the marginal cost of building an AI feature into a new product has fallen by roughly three orders of magnitude in five years. In 2018 a serious deployment of natural-language processing required a research team, a labelled dataset, six months of engineering, and a six- or seven-figure budget. In 2026 the same product can often be prototyped over a weekend by a single developer calling a hosted API, and a production version can be shipped in a fortnight. The economic implications are simple: AI features now appear in almost every consumer software product, not because every product team has built a model, but because they can call one. The same logic explains why so many "AI-first" startups are functionally thin wrappers around frontier APIs. The wrappers are valuable, choosing the right prompt, the right interface, the right user-data integration is real work, but the underlying capability is shared, and the economic value separates between the capability layer and the product layer. The textbook returns to this division of labour in Chapter 17.

The trajectory story

The third part of the answer is the most speculative, and the most consequential for anyone planning a career or a research programme around AI. If the capability levels of late 2025 were a stable plateau, if the next five years brought only refinements of existing models, the field would still be in the middle of a major economic transition, but the transition would be largely engineering. What makes the present moment unusual is that capabilities are continuing to grow, on most public benchmarks, at a rate that does not yet show clear signs of saturating.

The case for continued growth has four legs.

  • Continued compute scaling. Each new generation of GPU brings two- to three-fold improvement in throughput, and the leading laboratories are still increasing the size of their training runs by roughly an order of magnitude every eighteen months. The empirical scaling laws of Kaplan et al. (2020) and the Chinchilla paper of Hoffmann et al. (2022) continue to predict, within tolerable error, the loss reductions that come from these increases.
  • Post-training methods. Reinforcement learning from human feedback (RLHF), reasoning-RL on verifiable rewards, and direct preference optimisation are still relatively young, and each has produced large capability jumps that were not anticipated when pre-training scaling was the only known knob.
  • Synthetic data and self-improvement. A frontier model can generate the training data for the next iteration of itself. The full implications of this loop, sometimes called Ouroboros in the alignment literature, are not yet known. Early evidence suggests the gains are real but bounded; the bound itself is the open research question.
  • Agents with tools and the physical environment. Once a model can browse the web, execute code, edit files, and manipulate a desktop, the upper bound on what it can do is no longer set by what it knows but by what it can interact with. The integration of language models with tools is in its early years; the most informative results are still ahead.

The case against has four legs of its own.

  • Data exhaustion. The high-quality pre-training corpora, books, encyclopaedias, scientific literature, indexed web text, are nearly exhausted. The frontier labs are running into the limits of the open internet, and the productive corpora that remain are increasingly proprietary.
  • Diminishing returns to compute. Scaling laws have predictable but eventually-flat slopes; the test-set loss, even in the optimistic case, falls only as a power of the compute, not as an exponential of it.
  • Sticky capabilities. Long-horizon coherence, robust out-of-distribution generalisation, and reliable grounding in the physical world have proven harder than the headline benchmarks suggest. Some of these may yield to scale; some may require ideas that the field does not yet have.
  • Social and regulatory constraints. Even where capability is available, deployment may slow because of liability concerns, data-protection regimes, antitrust scrutiny, and the political economy of labour displacement. These are not failures of AI; they are the ordinary forces that govern any general-purpose technology.

We do not know which of the two cases is closer to right. The next decade will look more like 2014–2024, continued exponential improvement across most benchmarks, or more like, say, the deceleration of jet-engine performance after the 1970s, when a regime of steady gains gave way to a regime of incremental refinement. That is the question. The textbook does not pretend to answer it. What it does is teach the conceptual framework that lets you read a research paper from 2030 and decide, on the merits, which side it strengthens.

What this textbook will and will not do

The book covers seventeen chapters, organised by the conceptual dependencies of the field. Later chapters depend on earlier ones, but each is intended to be readable on its own once the prerequisites are met.

  • Chapters 2 to 5 build the mathematical foundations: linear algebra (Ch. 2), calculus and optimisation (Ch. 3), probability and information theory (Ch. 4), and statistics (Ch. 5). The treatment is condensed and assumes prior exposure at the level of a first-year mathematics or engineering course. Readers from a humanities or social-science background should plan to spend additional time here.
  • Chapters 6 to 8 introduce machine-learning fundamentals: classical supervised learning (Ch. 6), the workhorse algorithms, linear regression, decision trees, random forests, gradient boosting, support vector machines (Ch. 7), and unsupervised learning, including clustering, dimensionality reduction and the EM algorithm (Ch. 8). Most production AI systems on tabular data still rely on one of these methods, and they remain essential.
  • Chapters 9 to 11 cover deep learning: neural networks at the level of multilayer perceptrons (Ch. 9), the practicalities of training a deep network without divergence, SGD, momentum, Adam, batch normalisation, dropout, residual connections (Ch. 10), and convolutional networks for computer vision (Ch. 11).
  • Chapters 12 and 13 cover sequence and attention: recurrent networks, LSTMs and the sequence-to-sequence framework (Ch. 12), and the Transformer in full detail, including engineering at scale, the major variants, and the foundation-model recipe (Ch. 13). Chapter 13 is the longest single chapter in the book.
  • Chapter 14 covers generative models: variational autoencoders, generative adversarial networks, normalising flows, diffusion models, and the modern image-, audio-, and video-generation systems built on top of them.
  • Chapter 15 covers modern AI systems: frontier-scale large language models, pre-training, supervised fine-tuning, RLHF and DPO, Constitutional AI, tool use, agentic frameworks, retrieval-augmented generation, and multimodal extensions.
  • Chapter 16 is on ethics and safety: bias and fairness, privacy, interpretability, alignment, evaluation, and the social-science questions about deployment.
  • Chapter 17 covers applications: medicine, science, law, education, creative work, robotics, autonomous vehicles. The final chapter is a survey rather than a toolkit.

What the textbook does not cover in detail is, by design, also a long list. Symbolic AI techniques, search, logic, expert systems, appear briefly in the historical context of Chapter 1 and not elsewhere; the canonical references remain Russell and Norvig (Artificial Intelligence: A Modern Approach, 4th edition). Classical control theory and robotics are touched on in Chapter 17 but treated only as context for the modern learned approaches. Computer-vision systems engineering, image pipelines, calibration, sensors, ISP, appears briefly in Chapter 11; the canonical reference is Szeliski's Computer Vision: Algorithms and Applications. Natural language processing as a discipline, the pre-Transformer era of feature engineering, parsing, and pipeline design, is largely subsumed by Chapters 12 and 13; readers who want the older treatment should consult Jurafsky and Martin. AI policy, regulation, and law are treated briefly in Chapter 16 and otherwise left to specialised volumes that update faster than a textbook can. The specific commercial products, the current features of ChatGPT, Claude, Gemini and their successors, are not documented here because they change month by month; the documentation of any one of them is published by its vendor.

Two further points deserve mention. The book is light on specific framework APIs. PyTorch and JAX both appear in code examples, but the focus is on the underlying mathematics and algorithms. The frameworks change every few years; the underlying ideas are more durable. The book is also light on industrial productivity tooling, specific cloud platforms, MLOps orchestrators, observability stacks, because those are best learnt in the context of the system you are actually building. What the book is heavy on is the conceptual framework that lets you read a research paper from 2025 or 2030 and understand both what it claims and how to evaluate the claim.

Who this book is for

The book assumes three things of its reader.

  • One year of university mathematics, at the level of a first course in calculus and a first course in linear algebra. The mathematical chapters (Ch. 2–5) cover the necessary material in condensed form, but they are revision rather than first exposure.
  • One year of programming. Any language is acceptable in principle, but Python is the working language of the field, the language of the code examples in the book, and the language in which the modern frameworks (PyTorch, JAX, Hugging Face Transformers) are written. A reader with another language should expect to pick up enough Python to follow examples within a few weeks.
  • No prior AI exposure. The book is self-contained from Chapter 1 forward, and a motivated reader who has never opened a machine-learning textbook before should be able to work through it from end to end.

Given those prerequisites, the book is suitable for several distinct audiences.

  • Senior undergraduates in computer science, mathematics, statistics, engineering, or one of the quantitative natural sciences. The book is built for this reader.
  • Graduate students entering AI research from another field, physics, biology, economics, neuroscience, medicine. The mathematical chapters are intended to be a sufficient bridge.
  • Self-learners with the prerequisite mathematics and programming. The web edition is open access; the iOS edition adds animated lessons and adaptive quizzes for the same content.
  • Software engineers who want to understand AI as an engineering discipline rather than only call APIs. The book is not a tutorial on building AI products, but it is the conceptual layer that makes such tutorials make sense.

Readers from medicine, biology, social sciences, or the humanities have read earlier drafts of this material and made it through. The path is harder but not impossible, especially with a willingness to spend extra time on the mathematical chapters. The book aims to repay the effort with a working understanding of one of the major intellectual and technological developments of our era.

What you should take away

  1. AI in 2026 is qualitatively different from AI in earlier decades because three independent factors, capability, deployment, and trajectory, happen to be aligned at once; the alignment is what makes the moment unusual.
  2. Capability has crossed the threshold of a general-purpose technology in the sense of David (1990) and Bresnahan and Trajtenberg (1995); the right time-scale for thinking about its economic and social effects is decades, not months.
  3. Deployment friction has fallen by roughly three orders of magnitude in five years, which is why AI features now appear in almost every consumer software product, and why the economic value separates between the capability layer and the product layer.
  4. The future trajectory is unknown: there is a reasonable case for continued exponential improvement and a reasonable case for saturation, and the textbook teaches the conceptual framework needed to evaluate future evidence on its merits.
  5. The textbook covers seventeen chapters of mathematical foundations, machine-learning fundamentals, deep learning, sequence and attention, generative models, modern AI systems, ethics and safety, and applications; it leaves symbolic AI, classical control, NLP-as-a-discipline, AI policy, specific frameworks, and specific commercial products to other resources.

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).