What Is AI?: 1.3   Symbolic and sub-symbolic AI

Dr Chris Paton

1.3 Symbolic and sub-symbolic AI

If you read enough articles about artificial intelligence, you will eventually notice that the field talks about itself as if it were two different sciences pasted together. One of these sciences is concerned with rules, logic, and the kind of structured reasoning a careful philosopher might write down on paper. The other is concerned with patterns, weights, and the kind of statistical regularities a curious naturalist might notice after watching enough birds. Both call themselves AI; both are studied in the same departments and published in the same journals; and yet, for most of the field's history, their practitioners have disagreed quite sharply about what the words "knowledge" and "thinking" actually mean inside a computer.

This section is about that divide. The two traditions are usually called symbolic AI and sub-symbolic AI. Symbolic AI represents knowledge as discrete rules and symbols you could in principle write down on a whiteboard, for instance, "all birds fly, penguins are birds, but penguins do not fly". Sub-symbolic AI represents knowledge as patterns of activity in many simple processing units, most familiarly the weights in a neural network. The two traditions have different strengths, different weaknesses, and different historical moments of triumph and embarrassment. Understanding the divide is essential for reading any technical AI paper of the last seventy years.

Where §1.2 was about evaluation, this section is about representation: what is inside the machine when it is being intelligent. The aim is the simplest possible statement of what the two camps are, what they are good at, and how, in 2026, they have largely stopped quarrelling and started cooperating.

What "symbolic" means

A symbolic system is, at its root, a machine that pushes around tokens. The tokens are discrete: each one is either present or absent, and one token is clearly different from another. They are usually arranged into structures, strings, lists, or tree-shaped expressions, and the system processes those structures by means of formal rules that respect their shape. If you have ever written a computer program, debugged a piece of arithmetic, or worked through a logic puzzle, you have done symbolic processing. The discipline of computer science, in fact, was born out of mathematicians' attempts to formalise exactly this kind of activity.

The classic examples of symbolic AI systems are programs written in Lisp (a language whose very name, "List Processing", advertises its symbolic ambitions), programs written in Prolog (a language built around logical inference), automated theorem-provers, expert-system shells, and the procedural representations of cognitive architectures such as ACT-R and SOAR. The mature philosophical statement of the symbolic position is the Physical Symbol System Hypothesis, due to Allen Newell and Herbert Simon in 1976: "a physical symbol system has the necessary and sufficient means for general intelligent action". In plainer English: if you have a machine that manipulates symbols according to formal rules, you have everything you need for general intelligence; and conversely, if you have general intelligence, somewhere inside there must be such a machine.

A worked example will make the idea concrete. Suppose we want to build a small medical-diagnosis program for pneumonia. In the symbolic style, we sit down with a chest physician and ask her to list the rules she actually uses. After a long conversation we end up with rules of the form:

IF the patient has fever AND the patient has cough AND the chest X-ray shows infiltrate THEN diagnose pneumonia (probability 0.85).

This is a single rule. We can write down dozens or hundreds more. Crucially, every component of this rule is human-readable. We can argue with the threshold. We can ask, "what about a patient on immunosuppressants whose temperature is normal?", and we can write a new rule to handle that case. We can audit the system: given any patient record, the program can produce a step-by-step explanation of why it reached its conclusion. We can prove that, on a known case base, the rules give the correct answer for every recorded patient. None of these properties is automatic in a sub-symbolic system.

The tradition has a long and respectable lineage. It draws on Aristotle's syllogisms, on Boole's algebra of thought, on Frege's predicate logic, on the model theory developed by Tarski. It treats intelligence as something that, at its best, is transparent and defensible: something that can be written down, communicated, and criticised.

What "sub-symbolic" means

A sub-symbolic system stores knowledge in a different way: not as discrete rules but as continuous-valued patterns over many simple processing units. The unit is typically a small computational element loosely modelled on a biological neuron, adding up its inputs, applying a non-linear function, and passing the result on. The first such model, the McCulloch–Pitts neuron of 1943, was a deliberate attempt to formalise neural activity in mathematical language. The next big step, Frank Rosenblatt's perceptron of 1958, was implemented as a physical machine, the Mark I Perceptron, with banks of motorised potentiometers as its adjustable weights. The artificial neurons in a modern deep network are direct descendants of these early units, although the commitment is methodological rather than biological: nobody seriously claims that a transformer thinks the way a brain does.

In a sub-symbolic system, knowledge is not stored as a rule that you could read off the page. It is stored implicitly, in the values of the weights connecting one unit to another. Computation is not symbol manipulation; it is the distributed transformation of high-dimensional vectors. If you ask the system "what do you know about giraffes?", you cannot point to a particular weight and say, "that one is the giraffe weight". The knowledge is spread out, distributed, across many weights at once, and the same weights are simultaneously contributing to what the system knows about zebras, oak trees, and the letter G.

Take the same medical-diagnosis problem and rebuild it in the sub-symbolic style. Now the program is a deep neural network. We feed it a list of numbers describing the patient, temperature in degrees Celsius, an indicator of cough (1 if present, 0 if not), the patient's age, blood-test values, and so on, together with the raw pixels of a chest X-ray. The network passes these numbers through many layers of artificial neurons, each layer mixing and re-weighting the previous one, and produces at the end a single number between 0 and 1: the estimated probability that this patient has pneumonia. The "knowledge" of medicine inside this network lives in millions or billions of real-valued weights. We did not write those weights. They were learned, by gradient descent, from a large dataset of historical patient records each labelled with the eventual diagnosis.

You cannot inspect a single weight and say what it means. You cannot easily explain to a sceptical clinician why the network reached its answer. But you can do something the symbolic system struggled with: you can take a chest X-ray that no rule-writer ever anticipated, and the network will return a sensible probability anyway, because it has learned, from many thousands of examples, the visual signature of pneumonic infiltrate.

Strengths and weaknesses, side by side

The two traditions have, broadly speaking, complementary strengths.

Symbolic AI is interpretable: a human can read each rule and decide whether it is correct. It is compositional: rules can be combined to derive new conclusions in ways that respect the meaning of the parts, so a small set of rules can in principle cover a vast space of cases. It is sample-efficient: a single rule, written down once, applies to infinitely many situations, you do not need to show the system a thousand examples of "if it is raining then the ground is wet" before it gets the idea. And it is verifiable: in principle, the soundness of a deductive system can be proven mathematically.

These advantages come with painful failure modes. Knowledge engineering, sitting down with experts and extracting their rules, turns out to be brutal in practice. The rules an expert reports they follow are rarely the rules they actually use. Much of human expertise is tacit: a senior radiologist can spot a subtle infiltrate in a fraction of a second but cannot articulate, in rule form, exactly what cue she is responding to. Common sense, the body of background assumptions a five-year-old shares with an adult, has resisted decades of explicit encoding. Perception, motor control, and language understanding all involve gradations of similarity, exception, and noise that crisp logical rules struggle to capture. And reasoning under uncertainty, while it can be done within a logical framework using probabilistic logic or Markov logic networks, tends to become messy quickly.

Sub-symbolic AI has the opposite shape. It handles perception well: a convolutional network can learn to recognise a cat in a photograph without anyone having to write down rules for what cat-pictures look like. It learns directly from data: given enough labelled examples, it generalises gracefully to new instances of the same kind. It handles similarity natively, because nearby points in the high-dimensional vector space that the network builds tend to share semantic content, pictures of two different cats land closer together than a picture of a cat and a picture of a kettle. And it has scaled, since 2012, in ways that no other branch of AI has managed: more data and more computing power produce reliably better systems.

The sub-symbolic weaknesses are also real. The system is opaque: a bare matrix of weights does not yield, on inspection, a human-readable account of why the network classified a particular X-ray as pneumonia. This matters in safety-critical fields like medicine and law, where a wrong answer is not enough, we want to know why it was wrong. Sub-symbolic systems are sample-inefficient compared to humans: a child learns to recognise a giraffe from a handful of pictures, where a deep network may need thousands. They can be confidently wrong in ways their architecture does not flag, the now-classic adversarial examples in which a few imperceptibly altered pixels change a confident "panda" into a confident "gibbon" are a vivid illustration. And the act of debugging such a system is closer to experimental science than to ordinary programming: you cannot just step through the logic.

A useful slogan, due to many people independently, is that symbolic AI is good at the things you can write down and bad at the things you cannot, while sub-symbolic AI is the reverse.

Symbolic in detail: classical successes

The symbolic tradition has produced a long roll-call of impressive systems. A non-exhaustive tour:

Logic Theorist (Newell, Shaw, and Simon 1956) is widely cited as the first programme that could be called artificial intelligence. It proved theorems from Whitehead and Russell's Principia Mathematica, in some cases producing proofs the authors of the Principia judged more elegant than their own.
Advice Taker (McCarthy 1959) was a proposal, partly visionary, partly worked-out, for a programme that would represent its world in formal logic and accept new advice in the same form.
Resolution (Robinson 1965) gave automated theorem-provers a single inference rule, resolution refutation, sufficient for first-order logic, and made automated deduction a practical engineering activity.
MYCIN (Shortliffe 1976, Stanford) was an expert system for diagnosing blood infections and recommending antibiotic therapy. In evaluations it performed at the level of senior infectious-disease specialists. It was never deployed clinically, partly for institutional reasons and partly because the bookkeeping of keeping its rules current proved expensive.
R1/XCON (McDermott 1980, Carnegie Mellon and Digital Equipment Corporation) configured customer orders for VAX computers, a task that human technicians frequently got wrong, with expensive consequences. By 1986 R1/XCON was processing the bulk of DEC's orders and was widely credited with savings on the order of $25 million per year. It is the first expert system that paid for itself many times over in commercial use.
SHRDLU (Winograd 1972) was a typed-dialogue system that conversed in restricted English about a virtual world of coloured blocks. You could ask it to "find a block which is taller than the one you are holding and put it into the box", and it would answer correctly and execute the request. SHRDLU was famous in its time as a demonstration of what symbolic natural-language understanding could achieve in a deliberately closed world.
MACSYMA (MIT, 1960s onwards) was a symbolic-mathematics system that did algebra, calculus, and equation-solving on expressions rather than numbers. Its descendants (Maxima, Mathematica, Maple) are still in active use.
Cyc (Lenat 1984–) is the longest-running attempt to encode common-sense knowledge in formal logic. It now contains millions of assertions about ordinary objects and events and continues to be developed.

Each of these systems worked in the sense that mattered to its designers. The lesson, taken in aggregate, is that symbolic AI succeeds when knowledge is bounded, well-defined, and slow-changing, when we know in advance what kinds of object the system will encounter, and when human experts can articulate, with patience, what they want the machine to do.

Sub-symbolic in detail: the connectionist programme

The sub-symbolic side has its own long roll-call.

The McCulloch–Pitts neuron (1943) gave a mathematical model of a single neuron: a thresholded sum of weighted inputs.
Rosenblatt's perceptron (1958) extended the model with a learning rule, and the Mark I Perceptron machine that implemented it became, briefly, the most-discussed AI device in the world.
Hopfield networks (Hopfield 1982) showed that a recurrent network of binary units could store patterns as the stable states of a dynamical system, and revived theoretical interest in connectionism.
Boltzmann machines (Ackley, Hinton, Sejnowski 1985) added a stochastic, statistical-mechanics flavour to the same idea.
The 1986 Parallel Distributed Processing volumes by Rumelhart, McClelland, and the PDP Research Group, together with the popularisation of backpropagation in the same year, gave the field its standard learning algorithm and a manifesto.
Convolutional networks (LeCun 1989, 1998) brought neural networks to the recognition of handwritten digits and to commercial cheque-reading.
LSTMs (Hochreiter and Schmidhuber 1997) gave recurrent networks a way to retain information over long sequences without the gradient vanishing entirely.
AlexNet (Krizhevsky, Sutskever, Hinton 2012) won the ImageNet large-scale visual recognition challenge by a huge margin, and is conventionally taken as the moment the deep-learning revolution began in earnest.
Transformers (Vaswani et al. 2017) replaced recurrence with self-attention and made language modelling at scale tractable.
GPT-3 (2020) demonstrated that a large transformer trained on a large slice of the internet could perform many language tasks without task-specific training.
The frontier models of 2024–2026, successors to GPT-3, the Claude family, the Gemini family, and the open Llama and DeepSeek families, are direct descendants of this same line.

The lesson here is the mirror image of the symbolic lesson: sub-symbolic AI succeeds when data is abundant and the structure of the problem is hard to formalise. Most of the visible AI progress of the last decade has been sub-symbolic, and most of the public's mental image of "AI" in 2026 is a sub-symbolic image, a chatbot, an image generator, a self-driving car.

The 1969 Perceptrons episode and the first AI winter

It is part of the field's folklore that connectionism died for a decade because of a single book. The episode bears retelling because it is widely misremembered.

In 1969, Marvin Minsky and Seymour Papert, both at MIT, both at that time more sympathetic to symbolic AI, published Perceptrons. The book gave a careful mathematical analysis of single-layer perceptrons, the network architecture Rosenblatt had championed. Among its results was the now-famous demonstration that a single-layer perceptron cannot represent simple non-linearly-separable functions: the canonical example is XOR (the function that returns 1 if exactly one of its two inputs is 1, and 0 otherwise). Geometrically, XOR cannot be separated by a single straight line, and a single-layer perceptron is, in essence, a single straight line.

The book was widely, and somewhat unfairly, read as a refutation of the connectionist programme as a whole. It was not. Minsky and Papert were entirely clear that multi-layer networks could in principle represent any Boolean function; their concern was that no good learning algorithm for multi-layer networks was then known. But the polemical reading prevailed in the public conversation.

Around the same time, AI funding was being scrutinised on both sides of the Atlantic. The US Defence Advanced Research Projects Agency (ARPA, later DARPA) was tightening its grip after a period of generous funding. In the UK, the Lighthill Report of 1973, a review of AI commissioned by the Science Research Council and authored by the applied mathematician James Lighthill, was deeply pessimistic about the field's progress on real-world problems. Lighthill argued that AI was suffering from "combinatorial explosion": its successes scaled badly. Funding was cut sharply, particularly in the UK. The conjunction of these events, together with the Perceptrons reception, produced what is now called the first AI winter: a stretch of perhaps a decade in which AI was neither fashionable nor easy to fund.

It is important not to overstate what was lost. Multilayer extensions of perceptrons existed: Alexey Ivakhnenko and his colleagues in the Soviet Union had been studying multi-layer feedforward networks under the name "group method of data handling" since 1965. Theoretical work on associative memories continued at John Hopfield's group at Caltech in the early 1980s, culminating in the Hopfield-network paper of 1982. Backpropagation was rediscovered, in something close to its modern form, by several researchers across the 1970s and 1980s. The 1986 PDP volumes did not invent backpropagation; they popularised it and embedded it in a coherent intellectual programme. After 1986 the connectionist programme had its credibility back, but it did not displace symbolic AI: through the 1990s and early 2000s, the two traditions ran side by side, each with its own conferences, journals, and pet problems.

The synthesis: hybrid neuro-symbolic systems

The contemporary picture is more interleaved than either side imagined in the 1980s. Hybrid systems that combine neural pattern recognition with symbolic reasoning, sometimes labelled neuro-symbolic, produce some of the most impressive recent results.

AlphaGo (DeepMind 2016) and its successor AlphaGo Zero (2017) combined deep convolutional value and policy networks (sub-symbolic) with classical Monte Carlo tree search (symbolic). The neural network suggested promising moves and estimated the value of board positions; the search procedure expanded those suggestions into the deep look-ahead that finally beat the world champion at Go, a game whose combinatorial size had been thought to put it beyond machine play for decades.
AlphaProof (DeepMind 2024) interleaved a Gemini-trained language model with the Lean 4 theorem prover, achieving silver-medal performance at the International Mathematical Olympiad. The language model proposed candidate proof steps; Lean 4, a formal verification system, mechanically checked whether each step was valid. The combination did something neither piece could do alone. An advanced Gemini Deep Think variant officially achieved gold-medal standard at IMO 2025.
AlphaGeometry and AlphaGeometry 2 (DeepMind 2024) combined neural conjecture generators with symbolic deduction engines for Euclidean geometry, again at IMO standard.
Modern code agents combine large language models (sub-symbolic) with type checkers, test runners, debuggers, and formal verifiers (symbolic). The LLM proposes code; the symbolic infrastructure tells it whether the proposed code compiles, type-checks, and passes its tests; the LLM revises and tries again.
Tool-using LLMs that call calculators, web searches, SQL databases, and other external services are similarly hybrid. The language model is the supple, fluent perceptual surface; the external tools are precise, rule-bound, and reliable in ways the LLM is not.

The neat dichotomy of 1986 is gone. In 2026 the question for any practical AI system is no longer "symbolic or connectionist?" but "in what proportions, on what tasks, with what interfaces?". Almost every serious modern AI system is hybrid, in that it has a neural pattern-matcher somewhere and a symbolic component, even if that component is only an arithmetic check or a database lookup, somewhere else. The two great traditions have not so much merged as learned to talk to each other.

The deeper philosophical question, which of the two is doing the "real" thinking, is, in 2026, much less interesting than it was in 1986. The pragmatic answer is: both, in different proportions, depending on the job.

What you should take away

Symbolic AI represents knowledge as discrete rules and tokens that can be written down, read, audited, and combined. It is interpretable, compositional, and sample-efficient, but knowledge engineering is hard and crisp logical rules struggle with perception, exceptions, and gradations of similarity.
Sub-symbolic AI represents knowledge as continuous-valued patterns over many simple processing units, most familiarly the weights in a neural network. It learns directly from data, handles perception and similarity natively, and scales with compute, but is opaque, sample-inefficient, and can be confidently wrong.
The two traditions are not really competitors; they are complementary, and the same problem often admits both a symbolic and a sub-symbolic treatment. The medical-diagnosis vignette earlier in the section can be built either way, and a serious clinical system would build it both ways and combine the answers.
The story that Perceptrons killed connectionism for a decade is part of the field's folklore but is overstated. Funding cuts on both sides of the Atlantic, an over-strong polemical reading of Minsky and Papert, and the genuine difficulty of training multi-layer networks all played a role. Multi-layer work continued in less fashionable corners throughout.
The most impressive AI systems of the 2020s, AlphaGo, AlphaProof, AlphaGeometry, modern code agents, tool-using LLMs, are hybrid. They use neural networks for the parts of the problem that are hard to formalise, and symbolic machinery for the parts that need to be precise, auditable, or provably correct. Reading any modern AI paper, you should expect to see both kinds of component, and the interesting design decisions are usually about the interface between them.