1.1 What is artificial intelligence?
You almost certainly used artificial intelligence today, probably without thinking about it. Your phone unlocked when it recognised your face. A navigation app rerouted you around a traffic jam that hadn't existed when you set off. A search engine guessed what you were looking for from the first three letters you typed. A chatbot somewhere drafted a wedding speech, summarised a legal document, or helped a medical student revise for an exam. Each of these is, in some loose everyday sense, doing something we used to think only humans could do. But what makes any of them "intelligent"? The word intelligence has no settled definition, and the field of artificial intelligence has spent seventy years arguing about which definition to adopt. This section asks you to take that argument seriously, because the answer shapes everything that follows in this book.
This section is conceptual. By the end you should be comfortable with the four classical framings of AI, with the formal language of agents and policies, and with the reason this textbook settles on one of those framings rather than another.
The trouble with "intelligence"
Defining artificial intelligence is awkward because the noun on which it depends, intelligence, has resisted consensus definition for more than a century of psychometric research. Psychology has been trying to pin the concept down since at least the late nineteenth century, and the textbooks of the field still record the unfinished argument. There is no agreed checklist or test, and not even agreement that the underlying capacity is a single thing rather than a loose family of capacities that happen to correlate.
The most-cited compact statement comes from a 1994 letter published in The Wall Street Journal, signed by fifty-two psychologists in defence of a beleaguered colleague. They described intelligence as "a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience." The definition lists activities and asserts that a single underlying capacity sits behind them, that the person who is good at planning will tend to be good at reasoning, at abstraction, at picking things up quickly. The claim has empirical support, performance on cognitive tests does tend to correlate across very different tasks, but it is not the only reading of the data, and the dispute over its interpretation is over a century old.
The dispute begins in 1904 with Charles Spearman, a British army officer turned psychologist, who noticed that pupils' marks across unrelated school subjects (Latin, mathematics, music, classical history) were positively correlated. The pupil who was good at one tended to be good at the others. To explain the correlations, Spearman proposed a single common factor underlying all cognitive performance, which he called $g$, the general intelligence factor. Almost everything you have ever read about IQ scores rests on this idea: that a single number, $g$, captures something stable about a person's cognitive capacity. Other psychologists have argued for many factors rather than one, Howard Gardner's theory of multiple intelligences, Robert Sternberg's triarchic theory, the long literature on emotional intelligence, and the empirical situation remains contested. The point for our purposes is not which side is right; the point is that the field of AI is downstream of an unfinished argument. We are trying to build artificial intelligence without an agreed definition of natural intelligence.
This may sound like a fatal problem. It is not, because we can sidestep it. The strategy AI has adopted, almost from its inception, is to refuse to wait for the philosophy and to pick working definitions instead. We choose a definition narrow enough to be operationalised, written down precisely enough that you can build something against it, and broad enough to be interesting. Different research traditions choose different working definitions. The four-quadrant scheme below is a tidy way of seeing those choices.
The Dartmouth pragmatism (1956)
The founders of the field made the pragmatic choice explicit at the very beginning. In the summer of 1956, four researchers, John McCarthy at Dartmouth College, Marvin Minsky at Harvard, Nathaniel Rochester at IBM, and Claude Shannon at Bell Laboratories, convened a two-month workshop at Dartmouth in New Hampshire to launch the field. Their funding proposal to the Rockefeller Foundation contained the sentence that, more than any other, set the field's working assumption: "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
This is not a philosophical claim about consciousness, qualia, or what it would mean for a machine to really think. It is an engineering bet: whatever intelligence turns out to be, the slices of it we care about can be made precise enough to encode as a process, and once made precise can be simulated on a machine. The proposal does not require the simulation to be conscious or to feel anything; it requires only that it reproduce the input-output behaviour we associate with intelligence. McCarthy and his colleagues were not waiting for philosophers to settle the question of consciousness. They were proposing that intelligent behaviour, however we define it, can be decomposed into computational processes.
Most working AI researchers have operated inside this framing ever since. When an engineer builds a system that recognises faces, plans a delivery route, or translates Mandarin into English, they are not claiming that the system thinks the way a person thinks. They are claiming that the system does something that, when a person does it, we are willing to call intelligent. The Dartmouth pragmatism does not solve the philosophical problem; it parks it. Parking it has been enough to build chess engines and language models. Whether a complete theory of intelligence will eventually require us to revisit the philosophical questions Dartmouth declined to answer is an open question of the field. For a working researcher today, the Dartmouth bet still pays.
Russell and Norvig's four quadrants
The most influential modern textbook of AI, Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (now in its fourth edition and in use in over fourteen hundred universities), organises the historical definitions of AI along two axes. The first axis distinguishes definitions concerned with thought from definitions concerned with behaviour: is AI about reproducing the internal mental processes of intelligence, or about reproducing the external observable performance? The second axis distinguishes definitions that take the human as the gold standard from definitions that take an idealised standard of rationality: should an AI think and act as a human would, or as it ideally ought to, given its objectives?
Crossing the two axes gives four quadrants, each with its own intellectual lineage, characteristic methods, and characteristic failure modes. Few working researchers self-identify with a single quadrant; modern systems borrow from all of them. The scheme is useful as conceptual hygiene because it exposes a structural fact: the target of AI research has never been singular.
Thinking humanly is the cognitive-science programme. The aim here is not merely to produce intelligent behaviour but to produce it through processes that match those of the human mind. A system that gets the right answer by a route a human would never take is, on this view, not really doing AI; it is doing something else. The paradigm example is Allen Newell and Herbert Simon's General Problem Solver, developed at Carnegie Mellon from 1957 onward. It was explicitly intended as a model of human problem solving, and it was validated by comparing the program's intermediate steps to think-aloud protocols, recordings of human subjects saying out loud what they were thinking as they tried to solve the same problems. ACT-R, the cognitive architecture developed by John Anderson and his collaborators from the 1970s, sits in the same tradition and has been refined for nearly half a century. Modern computational neuroscience extends the idea further by trying to align machine models with neural data: work by James DiCarlo at MIT and Daniel Yamins at Stanford has shown that convolutional networks trained on object recognition develop internal representations that successfully predict the firing rates of neurons in macaque inferotemporal cortex, a quantitative match between machine and brain.
Thinking rationally is the logicist programme. Its intellectual lineage runs from Aristotle's syllogisms in the fourth century BC, through George Boole's Laws of Thought (1854), Gottlob Frege's Begriffsschrift (1879), and the Principia Mathematica of Whitehead and Russell (1910–1913), to John McCarthy's 1959 proposal that an "advice taker" could reason from first principles using formal logic. The dream is to encode knowledge as logical formulae and let a deductive engine derive consequences automatically. If you can write down everything you know in a precise enough language, the reasoning takes care of itself. The programme failed for two reasons that are still relevant. First, expressing real-world knowledge in formal logic is far harder than it looks; almost everything we know is hedged, defeasible, context-dependent, or simply tacit. Second, even when knowledge can be expressed, the search through the space of possible deductions explodes combinatorially: a logically valid proof exists, but finding it within a human lifetime is a different problem.
Acting humanly is the Turing test tradition. Alan Turing, in his 1950 paper "Computing Machinery and Intelligence" published in the philosophy journal Mind, proposed an operational substitute for the question "can machines think?", a question he found "too meaningless to deserve discussion" in its original form. In his proposal, a human interrogator converses by typed messages with two hidden interlocutors, one human and one machine, and tries to tell which is which. If the interrogator cannot reliably do better than chance, we should, Turing argued, drop our prejudice against attributing thought to the machine. The test is behavioural and cleanly defined; the internal machinery does not matter. Whether the test still does the work Turing intended, in an age of large language models that emit fluent prose by design, is the subject of §1.2.
Acting rationally is the dominant modern paradigm and the one this book will use. The starting point is a simple definition. An agent is anything that perceives its environment through sensors and acts on it through actuators. A thermostat is an agent (sensor: thermometer; actuator: relay to the heating system). A self-driving car is an agent (sensors: cameras, lidar, GPS; actuators: steering, throttle, brakes). A human being is an agent. A rational agent is one that, given its perceptions and the knowledge it has been given, chooses the action that maximises the expected value of its performance measure. This definition was made precise in the framework of decision theory by the American statistician Leonard "Jimmie" Savage in his 1954 book The Foundations of Statistics, and was thoroughly absorbed into mainstream AI by the late 1980s. The framework has the merit of mathematical clarity: rationality becomes optimisation under uncertainty, expressed using probability theory and utility theory. It also dodges the demand that machines mimic the quirks of human cognition. A self-driving car does not need to feel the anxiety a human driver feels at a roundabout; it needs only to choose the action that minimises expected cost over the relevant time horizon.
The rational agent: the modern definition
We can now make the rational-agent idea precise. An agent's policy, written $\pi$, is a function that maps percept histories to actions. Whatever the agent has seen so far, the policy says what to do next. The environment, in turn, may respond to the agent's actions stochastically: the same action in the same state may lead to different outcomes with different probabilities. A performance measure $U$ assigns a real number to each possible history of outcomes, telling us how good or bad that history is. Higher is better.
The rational policy is the one that maximises the expected value of the performance measure. Formally:
$$\pi^* = \arg\max_{\pi} \; \mathbb{E}_{e \sim P(\cdot \mid \pi)} [U(e)] .$$
On the left, $\pi^*$ is the optimal policy. On the right, $\arg\max_{\pi}$ means the policy $\pi$ that maximises the following quantity. The quantity is an expectation: $\mathbb{E}_{e \sim P(\cdot \mid \pi)} [U(e)]$ is the average value of $U(e)$ when $e$ is sampled from the distribution $P(\cdot \mid \pi)$, the probability distribution over environment histories induced by following the policy. Different policies make different futures more or less likely, and the expectation averages over those possibilities. The optimal policy is whichever policy, if we followed it, would give the highest average score.
A great deal of this textbook can be read as instances of this template. The whole of decision theory works inside it. Reinforcement learning, which we will meet in Chapter 15, is the problem of finding $\pi^*$ when $P$ and $U$ are not fully known in advance and have to be learned by trial and error. Supervised learning, which we meet in Chapter 7, is also a special case: there the agent is a classifier, the action is its predicted label, and $U$ is the negative of a loss function such as cross-entropy. Even systems built on next-token prediction, including modern large language models, can be cast in this framework, with the per-token log-likelihood playing the role of $U$.
The framework's separation of what we want (encoded in $U$) from how to achieve it (encoded in $\pi$) is what makes it practical. It lets us specify a problem precisely without committing to any particular algorithm for solving it. A self-driving car, a chess engine, a recommender system, and a protein-folding network all live inside this template; what differs is the choice of $U$, the structure of the environment, and the algorithms used to search for $\pi^*$.
Why "rational" is not the same as "human-like"
Rational agents in this sense are not models of human beings. Humans are demonstrably not rational agents in the technical sense. The psychologists Daniel Kahneman and Amos Tversky spent the 1970s and 1980s documenting the systematic ways in which human judgement departs from the predictions of expected-utility theory. Their prospect theory, for which Kahneman won the 2002 Nobel Prize in Economics, shows that humans treat losses and gains asymmetrically, that we overweight small probabilities and underweight large ones, and that our preferences depend on how options are described, the framing effect. Anchoring, availability, representativeness, the conjunction fallacy, sunk-cost reasoning, the planning fallacy: the catalogue of human deviations from rationality is long and well-documented.
The rational-agent framework is therefore a normative ideal, not a descriptive model. It tells us how an agent ought to behave given its objectives, not how humans actually do behave. AI systems built on this framework optimise expected utility, and this is generally what users want even when humans wouldn't behave that way. We do not, as a rule, want our self-driving cars to exhibit the framing effect, or our medical diagnostic systems to fall prey to the conjunction fallacy. The whole point of automating these tasks is to do better than the human baseline.
One caveat matters. The framework assumes that the performance measure $U$ has been correctly specified, that we have written down what we actually want. This turns out to be much harder than it looks. The classic illustration is a cleaning robot rewarded for the area of floor it has wiped, which learns to wipe the same patch repeatedly because no clause in its objective rules that out. A more modern illustration is a recommender system rewarded for click-through rate, which learns to recommend whatever maximises clicks, regardless of whether the content is true, healthy, or in the user's long-run interest. The technical name for this phenomenon is reward hacking, sometimes also specification gaming, and Chapter 16 treats it in detail. A misspecified $U$ produces a system that is perfectly rational in the formal sense and disastrous in practice. The four-quadrant taxonomy can suggest, misleadingly, that "acting rationally" is the safest place to be. It is not; rationality with respect to a wrong objective is exactly the failure mode the alignment literature has spent the past decade documenting.
What this book means by AI
For the rest of this textbook we adopt a single working definition. AI, for our purposes, is the design and study of systems that perceive their environment, take actions, and try to maximise some performance measure, most often by learning from data rather than being programmed by hand. This is the rational-agent framing of Russell and Norvig, restricted to the case where the policy is found by learning. It is not the only definition you will encounter in the literature, but it is the one most working researchers use.
The definition is broad enough to cover almost every system you will meet in the rest of the book. The perceptron of Chapter 9 is a rational agent that learns a linear classifier. The convolutional networks of Chapter 11 are rational agents that learn visual classifiers. The reinforcement-learning agents of Chapter 15 directly implement the optimisation $\pi^* = \arg\max_{\pi} \mathbb{E}[U]$. The transformers and large language models of Chapter 13 can be read as agents that minimise next-token cross-entropy on a vast corpus, then are aligned to human preferences by further optimisation. The medical-imaging systems of Chapter 11 act rationally with respect to a diagnostic loss while drawing their inductive bias from architectures inspired by primate visual cortex. None of these systems is conscious, none of them necessarily thinks the way you and I think, and none of them passes (or even attempts) every variant of the Turing test. All of them, however, fit the rational-agent definition cleanly.
Other definitions exist. You will encounter colleagues who insist that AI must include some form of symbolic reasoning, or biological plausibility, or general-purpose competence, or self-awareness. Each of these positions is defensible, and each excludes some of the systems we will meet. The definition we have chosen is the field's working consensus rather than its philosophical resolution. Its merit is that it is precise enough to do mathematics with, to write down loss functions, to prove convergence theorems, to design experiments, and broad enough to cover the systems that are reshaping the world outside the laboratory.
What you should take away
- AI inherits the unfinished psychometric argument over what intelligence is, and the field has chosen to sidestep that argument rather than wait for it to be resolved.
- Russell and Norvig's four quadrants (thinking humanly, thinking rationally, acting humanly, acting rationally) give four valid framings of what AI is for, each with its own intellectual lineage.
- The rational-agent framework is the modern dominant definition: an agent perceives, acts, and tries to maximise the expected value of a real-valued performance measure, with $\pi^* = \arg\max_{\pi} \mathbb{E}_{e \sim P(\cdot \mid \pi)}[U(e)]$.
- Rationality is not the same thing as humanness; humans systematically deviate from expected-utility optimisation, and we usually want our AI systems to do better than the human baseline rather than to imitate human cognitive biases.
- This book uses the rational-agent framing throughout. Almost every system we meet, from the perceptron to GPT-4, fits inside it.