- Define artificial intelligence and distinguish it from related concepts such as machine learning and deep learning
- Trace the history of AI from the 1956 Dartmouth workshop through two AI winters to the modern deep learning era
- Distinguish between narrow AI (ANI), artificial general intelligence (AGI), and artificial superintelligence (ASI)
- Identify the three classical paradigms of machine learning — supervised, unsupervised, and reinforcement learning — and the rise of self-supervised learning
- Describe the stages of the end-to-end AI pipeline, from problem definition through deployment and monitoring
Every time you ask a voice assistant for the weather, unlock your phone with your face, or get a film suggestion from Netflix, you are using artificial intelligence. AI is no longer a research curiosity. It is embedded in products used by billions of people every day.
But what exactly is AI? How did it get here? And what are the key ideas behind it? This opening chapter answers those questions. You will learn how the field is defined, trace its turbulent history, meet the main types of AI systems, get an overview of machine learning, and walk through the end-to-end pipeline for building real AI systems.
1.1 What Is Artificial Intelligence?
Defining AI is surprisingly hard, because "intelligence" itself resists easy definition. The field's founders offered a working answer at the 1956 Dartmouth workshop McCarthy, 1955: the study of how to make machines do things that would require intelligence if done by a human. This practical framing focuses on behaviour — can the machine solve the problem? — and sidesteps the philosophical question of consciousness.
Four Ways to Define AI
Russell and Norvig Russell, 2020 organise definitions along two axes: thinking vs acting, and human-like vs rational. This gives four quadrants:
- Thinking humanly: model the internal processes of the human mind.
- Thinking rationally: use formal logic to derive correct conclusions.
- Acting humanly: produce behaviour indistinguishable from a human (the Turing test TURING, 1950).
- Acting rationally: choose actions that maximise expected outcomes given available knowledge.
The acting rationally view dominates today. An agent perceives its environment and takes actions to maximise some measure of performance. This definition is precise enough to formalise mathematically and broad enough to cover everything from a thermostat to a self-driving car.
Narrow vs General AI
Narrow AI (weak AI) is designed for one task: playing chess, translating languages, detecting fraud. It can be extraordinarily good within its domain but cannot transfer its skills elsewhere. Every deployed AI system today is narrow — including large language models, which derive all their capabilities from the single objective of next-token prediction.
General AI (AGI) would match human flexibility: learning any task, reasoning about novel situations, and transferring knowledge across domains. No such system exists. Whether current deep learning can lead there, or whether entirely new ideas are needed, is an open and vigorous debate.
The AI Effect
There is an old joke: once a problem is solved, it stops being called AI. Spell-checking, route planning, and OCR were all once AI problems. Today they are just features. The frontier of what counts as AI is always moving forward.
1.2 History of AI
Origins (1956–1974)
AI as a discipline was born at the 1956 Dartmouth workshop, where McCarthy, Minsky, Rochester, and Shannon proposed McCarthy, 1955: "Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
The decade that followed was a golden age. Programs solved algebra problems, proved logical theorems, and held simple conversations. McCarthy invented Lisp (1958). Newell and Simon built the General Problem Solver. The mood was euphoric.
First AI Winter (1974–1980)
Reality set in. Early programs did not scale. The combinatorial explosion — the exponential growth in possibilities as problems get larger — proved devastating. Machine translation, thought to be straightforward, turned out to require deep understanding of context and world knowledge. The Lighthill Report (1973) Lighthill, 1973 led to sharp funding cuts in the UK. AI entered its first winter.
Expert Systems and the Second Winter (1980s–1990s)
Expert systems — programs encoding specialist human knowledge — drove a resurgence. MYCIN diagnosed infections. R1/XCON configured computer orders. Japan launched the Fifth Generation project. But expert systems were brittle, expensive to maintain, and could not learn from experience. When the hype faded, a second winter followed.
The Modern Era (2010s–Present)
Three forces converged: vast digital data, GPU computing power, and breakthroughs in deep learning.
- 2011: IBM Watson wins Jeopardy!
- 2012: AlexNet Krizhevsky, 2012 wins ImageNet by a huge margin, launching the deep learning revolution.
- 2016: AlphaGo Silver, 2016 defeats the world Go champion.
- 2018 onward: GPT and BERT Devlin, 2019 transform NLP.
- 30 November 2022: OpenAI releases ChatGPT. One million users in five days. Over a hundred million in two months — the fastest-growing consumer product in history.
By the mid-2020s, AI was embedded in products and services used by billions, reshaping industries, education, and daily work.
The Lesson
AI history is a story of cycles: grand ambitions, sobering setbacks, then renewed progress on firmer foundations. When a technique is called a breakthrough, ask: what are its limits? Where will it fail? The most durable advances are grounded in solid maths, validated by rigorous experiment, and honest about what remains unsolved.
1.3 Types of AI
The ANI–AGI–ASI Spectrum
- Artificial narrow intelligence (ANI): excels at one or a few tasks. All current AI, from chess engines to GPT-4.
- Artificial general intelligence (AGI): would match human cognitive flexibility. Does not yet exist.
- Artificial superintelligence (ASI): would greatly exceed the best human minds in every domain. Entirely speculative. Bostrom (2014) Bostrom, 2014 examined the risks.
This trichotomy distinguishes what we have built (ANI), what some aspire to build (AGI), and what others worry we might eventually build (ASI).
By Problem Type
- Classification: assign inputs to discrete categories (spam or not?).
- Regression: predict continuous values (tomorrow's temperature?).
- Generation: produce new content — text, images, music, code.
- Reinforcement learning: learn sequences of decisions by interacting with an environment and receiving rewards.
Symbolic vs Sub-Symbolic
Symbolic AI (1950s–1980s) represents knowledge with explicit rules and logic. Great for structured reasoning, poor at perception and learning from raw data.
Sub-symbolic AI (neural networks, statistical methods) represents knowledge as patterns of numerical weights. Great at perception and pattern recognition, but can be opaque.
Much recent excitement comes from sub-symbolic successes (deep learning), but hybrid approaches combining both are gaining interest.
AI vs ML vs Deep Learning
Think of them as concentric circles:
- AI: any technique that enables machines to mimic intelligent behaviour.
- Machine learning: a subset of AI where systems learn from data. (A hand-coded expert system is AI but not ML.)
- Deep learning: a subset of ML using neural networks with many layers. (A decision tree is ML but not DL.)
1.4 Machine Learning Overview
Machine learning is the engine of modern AI. Instead of writing rules by hand, you show the algorithm examples and let it discover the patterns.
Supervised Learning
Training data has input–output pairs. The algorithm learns a function mapping inputs to outputs. Classification (discrete outputs) and regression (continuous outputs) are both supervised. The objective: minimise a loss function measuring the gap between predictions and truth.
Unsupervised Learning
No labels. The algorithm finds structure on its own: clusters, latent factors, compact representations. K-means, PCA, and variational autoencoders are all unsupervised. Especially valuable when labelled data is scarce.
Reinforcement Learning
An agent interacts with an environment, taking actions and receiving rewards. The goal: learn a policy (mapping from states to actions) that maximises cumulative reward. Natural for games, robotics, portfolio management, and traffic navigation.
Self-Supervised Learning
The model creates its own labels from the data's structure. A language model predicts the next word — the "label" is just the actual next word, freely available in any text. This enables training on vast unlabelled data. Self-supervised pretraining followed by fine-tuning is now the dominant recipe in NLP and is spreading to vision and other fields.
The Central Challenge: Generalisation
A model must work on data it has never seen, not just on its training set. Overfitting means memorising the training data, including its noise. Underfitting means being too simple to capture real patterns. The art of ML is navigating between these extremes — using regularisation, cross-validation, early stopping, and careful architecture choices. This theme returns throughout the book.
1.5 The AI Pipeline
Building a real AI system is more than choosing an algorithm. It is a multi-stage engineering process.
1. Problem Definition
Translate a vague question into a precise ML task. What are you predicting? What data is available? What does success look like? A hospital that wants to "use AI to improve outcomes" must decide: predict readmission risk? Detect vital sign anomalies? Recommend treatments?
2. Data Collection and Preparation
This stage typically consumes most of the project time. Gather data, clean it (remove duplicates, fix errors, handle missing values), and transform it into model-ready format. Split into training, validation, and test sets.
3. Model Selection and Training
Choose an architecture (logistic regression, random forest, CNN, Transformer). Train by optimising parameters to minimise the loss. Tune hyperparameters (learning rate, batch size, layers) using the validation set. Iterate.
4. Evaluation
Assess performance on the held-out test set. Use appropriate metrics (accuracy, precision, recall, F1, AUC, MSE). Also examine behaviour qualitatively: are there systematic errors? Biases? Failures on certain subgroups?
5. Deployment and Monitoring
Integrate the model into a production system. Monitor for data drift (input distribution changes) and concept drift (the relationship between inputs and outputs changes). A fraud model trained on last year's patterns may fail as criminals adapt. Automated retraining, A/B testing, and human review keep models accurate over time.
The pipeline is not linear — it is a cycle. Each stage can send you back to an earlier one. Failures at any point can undermine the whole system.