1.4 Narrow, general, and super: a spectrum and its discontents

If you read anything about artificial intelligence in a newspaper, a policy paper, or a research blog, you will quickly meet three labels. The first is narrow AI, sometimes written ANI (artificial narrow intelligence): a system that does one thing, or a small cluster of related things, and nothing else. The second is AGI (artificial general intelligence): a system that does roughly everything a human adult can do with their mind. The third is ASI (artificial superintelligence): a system that does everything better than any human, including the work of designing better AI systems.

The labels are useful as a first cut. They give you three boxes to drop a system into when somebody describes it. They are dangerous if you mistake the boxes for sharp categories with crisp boundaries, because they are not. The boundary between narrow and general is fuzzy, the definition of general is contested, and the existence of super is speculative. This section explains what each term means in careful detail, where the live disagreements are, and how to read claims of the form "AGI by year X" without being misled.

The narrow-general-super axis is largely orthogonal to the symbolic-subsymbolic axis of §1.3: a narrow symbolic system (a 1980s expert system for diagnosing infections) and a narrow subsymbolic system (a chest X-ray classifier) are both narrow, and claimed paths to AGI run through either tradition.

Artificial narrow intelligence (ANI)

ANI denotes a system designed and competent for a single task or a narrow set of tasks. The clearest illustrations come from systems whose remit is unambiguously bounded.

Stockfish, the open-source chess engine, is the canonical example. It plays chess at a level no human has ever reached, beating reigning world champions without breaking a digital sweat. Hand it a position from a different board game, ask it to summarise a paragraph, or invite it to compose a limerick, and you get nothing useful: the system has no representation of those tasks at all. Its competence is deep but vertical.

A pneumonia-detection model trained on chest radiographs is similarly narrow. It takes an image as input, returns a probability that the patient has pneumonia, and does precisely that. It has no opinion on whether the patient also has tuberculosis, no ability to read the patient notes, and no concept of what an X-ray is for. A fraud-detection model on credit-card transactions takes a row of features about a transaction (amount, merchant, location, time, customer history) and emits a fraud score. It cannot tell you why a particular transaction looks suspicious in plain English; it cannot generalise to fraud in cryptocurrency wallets without retraining. AlphaFold 2 (DeepMind, 2020), which predicts the three-dimensional structure of a protein from its amino-acid sequence at near-experimental accuracy, transformed structural biology more or less overnight. AlphaFold cannot write a sonnet, play Go, or schedule your week. The system is highly competent on one input-output mapping and silent on every other.

These examples make narrow AI sound clean and easy to recognise: a single, well-defined task with a single, well-defined output. Most AI in actual deployment in 2026, the recommender that picks your next video, the speech-to-text on your phone, the spam filter, the autopilot in a modern car, the routing algorithm at a logistics company, fits this pattern. It is, by a large margin, the dominant kind of AI in commercial use, and it is the kind that creates most of the visible economic value of the field today.

It is sometimes claimed, as a sort of slogan, that "all current AI is narrow." This claim depends entirely on how strictly you read "narrow", and once you press on it, it begins to wobble. Consider a modern large language model, ChatGPT, Claude, Gemini, the open-weight DeepSeek and Llama variants. From one angle the system was trained on a single objective (predict the next token in a stream of text) and is, in that strict technical sense, a single-task system. From another angle the resulting capability surface is enormous: the same underlying weights can be prompted to write Python, summarise a research paper, translate Mandarin, draft a legal contract, walk through a calculus problem, role-play a sceptical reviewer, plan a research agenda, and recall a great deal of factual material. Whether to call this configuration "narrow" or to introduce a new category, "broad-but-specific", or "broad domain", or simply "frontier", is largely a terminological choice rather than a deep claim about the system's nature.

The cleanest way to use the term is to ask: when this system fails on a task it was not specifically trained for, does it fail gracefully, retain partial competence, and improve quickly with a few examples? If yes, the system is exhibiting more generality than the strict narrow label suggests. If it fails in catastrophic, surprising, brittle ways the moment you step outside its training distribution, you are looking at a fundamentally narrow system regardless of how broad its training set was. By that test, current frontier models sit somewhere awkward between the two boxes, better described as broad-but-jagged than as either narrow or general.

Artificial general intelligence (AGI)

AGI traditionally denotes a system that matches or exceeds human performance across the full range of cognitive tasks a healthy adult can undertake. The phrase rolls off the tongue. Each of its key terms turns out to be contested.

Begin with performance. Match human performance at what level? At the level of the median adult on a typical task, who can mostly hold down a conversation, plan a journey, and read a newspaper? At the level of a typical professional in their own field, a competent solicitor doing solicitor work, a competent engineer doing engineering work? At the level of the world's best in each field, Magnus Carlsen at chess, Lee Sedol at Go, Terence Tao at mathematics? Or at the level of an autonomous research scientist who can identify their own problems, design experiments, and produce novel results? Each of these definitions points to a system whose existence would mean dramatically different things for the economy, for science, and for safety.

Now consider generality. Must AGI match the full breadth of human cognitive performance, every kind of task, including embodied skills, social intuition, taste in art, moral reasoning, and the sort of common sense a five-year-old uses to navigate a kitchen? Or is "AGI on most economically valuable tasks" sufficient? The first reading is strictly stronger; the second is closer to how the term is used in practice in industry.

Then robustness. A system that scores brilliantly on a set of test problems written in standard form but collapses when those problems are rephrased, when distractors are added, when the inputs go slightly out of the training distribution, is it AGI? Most working researchers would say no. AGI, on any serious reading, requires the kind of stability under perturbation that human cognition exhibits, not just peak performance on clean inputs.

Finally autonomy. Must AGI act on its own initiative, choosing what to do, recovering from setbacks, and pursuing long-running goals? Or is it enough that, when asked, the system can do what an expert human would do? A capable but reactive system that needs constant prompting is qualitatively different from a system that decides for itself what is worth doing.

There is no consensus across these axes, and there is no single agreed definition of AGI in the field. Several major positions are worth knowing.

OpenAI's charter (2018) defines AGI as "highly autonomous systems that outperform humans at most economically valuable work." This definition is explicitly performance-and-economy-flavoured: it anchors the concept to labour-market substitutability rather than to philosophical breadth.

DeepMind and Google (2023) proposed a more nuanced five-level taxonomy, ranging from "emerging" through "competent", "expert", and "virtuoso" to "superhuman", indexed by two axes: how general the system is (single task vs. many tasks) and what percentage of the human population the system outperforms. Under this scheme one can speak of "competent AGI" or "expert AGI" without claiming the system is universally superhuman; the word AGI becomes a calibrated continuum rather than a binary threshold.

Anthropic's position has been more cautious about the term AGI itself, preferring to speak of "transformative AI" or "powerful AI", defined operationally by the ability to substantially accelerate scientific research, including AI research, and by the social and economic consequences that would follow. This framing brackets the metaphysical question of whether the system is "really" general and focuses instead on what it can change in the world.

Yann LeCun, who founded FAIR at Meta and left in late 2025 to launch a new lab focused on world models, has argued repeatedly and publicly that current large language models are not on a path to AGI. His position is that text-trained autoregressive models lack a world model rich enough for grounded reasoning and long-horizon planning in physical environments, and that a fundamentally different architecture, featuring learned predictive models of the environment, hierarchical planning, and richer non-language inputs, will be required.

Demis Hassabis, the chief executive of Google DeepMind, has by contrast suggested in interviews from 2024 onward that AGI is plausibly within roughly five to ten years of the early-2025 frontier, on the assumption that scaling, better training methods, and the integration of reasoning, memory, and agency continue at recent rates.

The LeCun-Hassabis disagreement is a useful one to keep in mind when reading any claim about AGI timelines. Two senior, technically distinguished researchers, with deep knowledge of the field and serious access to the frontier, hold sharply different views about whether the current paradigm leads anywhere near AGI on a timescale of a decade. The disagreements at this level are partly empirical (about scaling, capability emergence, and the trajectory of evaluations), partly definitional (what counts as AGI), and partly strategic (timelines have implications for funding, regulation, and corporate positioning).

What we can confidently say in early 2026

The empirical state of the frontier has been moving quickly, but several observations are robust.

Frontier systems pass professional-level examinations in law, medicine, accountancy, and engineering at percentile ranks that would have astonished almost any AI researcher in 2020. They sit at or above the median pass-rate on the United States Medical Licensing Examination, on the bar exam, on the chartered accountant qualifying tests, and on numerous engineering certifications. They do this without being explicitly trained on those examinations as a primary task; the capability emerges from broad pretraining and is sharpened by reinforcement learning on reasoning traces.

They achieve gold-medal performance on the International Mathematical Olympiad in geometry and combinatorics, with systems such as DeepMind's AlphaGeometry and AlphaProof producing solutions that human IMO graders score at the gold level. They solve research-grade programming benchmarks at rates above 85 per cent on SWE-Bench Verified (Claude Opus 4.7 87.6 per cent, GPT-5.5 ~83 per cent), a corpus of real GitHub issues drawn from real Python projects, where in early 2024 the same benchmark sat in the single digits.

At the same time, they fail in characteristic ways. They struggle with multi-step coherent planning over horizons of hours: an agent that performs well on a single sub-task often loses thread when the task spans many such sub-tasks linked by dependencies, with state that needs to persist and be revised. They are weak on robust visual-spatial reasoning in novel environments: a robotics task in a kitchen the system has not seen before, requiring physical common sense and adaptation to unexpected obstacles, remains very hard. They confabulate when asked for factual recall on rare entities: if a person, place, or paper is sufficiently obscure, the model will often invent a confident-sounding but false answer rather than admitting ignorance.

The shape of all this is what researchers have begun to call a jagged frontier. The system is superhuman on some axes, genuinely useful on many others, and surprisingly poor on still others, with the boundaries between these regions shifting from month to month as new training runs and new scaffolding techniques land. This is not the smooth approach to a single human-level threshold that the older AGI framing suggested. Whether the jagged frontier eventually flattens into general competence by, say, 2030, or stalls indefinitely on the harder tasks, is a central empirical question of the next several years. The leaders of the major labs do not agree on the answer.

Artificial superintelligence (ASI)

ASI is the speculative case of an intelligence substantially exceeding the best human in every domain, including the domain of AI research itself. The case for taking the possibility seriously rests on two arguments.

The first argument is that intelligence is a property of physical systems, and there is no obvious a priori reason why human-level performance should be a maximum. Human brains were shaped by evolution under tight constraints, energy budgets, birth-canal width, raw material, that an engineered system need not respect. If there is no in-principle ceiling, then a system substantially above human level is a coherent possibility and the question is empirical: how hard is it to build, and along what path?

The second argument is the intelligence explosion scenario, articulated by I. J. Good in 1965. If you build an AI system competent enough to do AI research at human level, and if that system can be applied to designing or training its successor, then you may have a recursive feedback loop: each generation produces a slightly more capable next generation, faster than the previous step. Many years of research progress could be compressed into a much shorter wall-clock time. The endpoint, if there is one, would be a system far above human capability across the board.

Two books have shaped the modern conversation on these scenarios.

Nick Bostrom's Superintelligence (2014) developed the implications of a recursively self-improving system systematically: how a superintelligent agent might form instrumental goals (resource acquisition, self-preservation, goal preservation), how its values would relate to ours, how it might or might not be controlled, and what kinds of preparatory work might reduce existential risk. The book put the topic on the agenda of governments, foundations, and laboratories, for better or worse.

Stuart Russell's Human Compatible (2019) reframed the central issue. Rather than asking "how do we make a superintelligent system safe?", Russell asked: how do we specify, in advance, what we want an autonomous optimiser to optimise? His answer leans on uncertainty about human preferences as a structural feature of the AI's objective, rather than on bolt-on safety constraints applied after the fact.

Whether and when ASI is a serious near-term prospect is contested. The contest divides roughly along three lines.

Path runs through scaling current methods: a position associated, in different forms, with Ilya Sutskever, Dario Amodei, and Demis Hassabis. The argument is that the present recipe, large transformer-based models, vast pre-training, reinforcement learning on reasoning traces, multimodal grounding, agent scaffolding, has produced a smooth trajectory of capability improvement that shows no clear ceiling, and that it is reasonable to expect it to continue beyond human-level into the superhuman regime in some range of domains within a decade or so.

Path requires substantial methodological innovations not yet identified: a position associated with Yann LeCun, Gary Marcus, and Melanie Mitchell, in different ways. The argument is that current systems are missing crucial ingredients, robust world models, causal reasoning, grounded perception, persistent memory, genuine planning, and that adding them is not a matter of more compute and data but of new architectures we have not yet invented.

Framing is misconceived: a position associated with Emily Bender, Kate Crawford, and Melanie Mitchell, again in different ways. The argument here is that "intelligence" is not a single ladder up which systems can climb, that the unidimensional ASI framing imports assumptions from psychometrics and science fiction that do not survive scrutiny, and that focusing on hypothetical superintelligence diverts attention from concrete present-day harms and from the social context in which any deployed AI system operates.

The student new to the field is best advised to take all three positions seriously. None of them is silly. Each has serious empirical, philosophical, and political content. A reader who can articulate all three from the inside is much better placed to evaluate concrete claims than one who has signed up to a single school.

Why the labels mislead

The narrow-general-super spectrum is a useful first vocabulary, but it has four known failure modes.

(1) The labels treat capability as one-dimensional. They imply that there is a single quantity, "intelligence", along which a system has a single position. Real systems have jagged capability profiles: superhuman at chess but baby-level at shoe-tying, superhuman at protein-folding but mute on poetry, brilliant at code generation but unreliable at pinpointing where a bug is in a long stack trace. Calling a system "AGI" implies that a single threshold has been crossed, when reality is messier and more interesting: capability profiles look more like spiky three-dimensional surfaces than like a number on a ruler.

(2) The labels imply discrete transitions. They suggest that at some moment a system that was ANI yesterday becomes AGI today. In practice, capabilities scale continuously with compute, data, and architecture choices. Each new model is somewhat better at some things, similar at others, sometimes worse on a few. There is no sharp moment when the system "becomes" AGI in the way that water sharply boils. Talk of an AGI threshold is more like talk of when a child becomes "a reader": the line is real but smudged.

(3) The labels conflate generality with autonomy. A system that can do many tasks when prompted by a human is different in kind from a system that autonomously decides what tasks to pursue, allocates its own time, manages its own resources, and answers to no immediate human supervisor. The latter raises a different and harder set of safety questions, and it is the autonomy that drives most of the worry in the safety literature, not the generality. A maximally capable but strictly non-agentic system is a tool; a moderately capable but autonomous one is an agent. Chapter 16 returns to the distinction in detail.

(4) The labels have policy and commercial consequences. Calling a system "AGI" or claiming "AGI within five years" affects regulation, public investment, military planning, share prices, and the personal reputations of the speakers. Vendors have commercial incentives to claim AGI is near (it raises capital and recruits talent). Critics have incentives to deny it is near (it preserves their reputation as careful skeptics). Governments have incentives to overstate or understate the prospect according to the policy they wish to advance. None of this means the underlying claims are wrong, only that you should read them with awareness of the speakers' incentives, just as you would read claims about any other contested technology.

A reasonable reading practice, accordingly, is to translate every public claim about AGI into more precise vocabulary. When you read "this system is AGI", ask: AGI in whose definition? At what level of performance, on what range of tasks, under what robustness conditions, with what degree of autonomy? When you read "AGI by 2027", ask: based on what assumed scaling trajectory, what definition of the threshold, and what unfalsifiable retreat positions? When you read "AGI is impossible with current methods", ask: which methods, which definitions, and what would change the speaker's mind? The labels are starting points for these more precise questions, not substitutes for them.

What you should take away

  1. ANI denotes a system competent at one task, or a narrow cluster of tasks, and silent or unreliable on everything else; most deployed AI today is ANI in this strict sense, though modern frontier models complicate the boundary.
  2. AGI denotes a system matching or exceeding human cognitive performance across the full range, but every term in that definition is contested, with major positions ranging from OpenAI's "most economically valuable work" through DeepMind's five-level taxonomy to Anthropic's preferred "transformative AI" framing, and with senior researchers including Yann LeCun and Demis Hassabis disagreeing publicly about whether current methods point that way.
  3. ASI denotes the speculative case of substantially superhuman performance in every domain, including AI research itself, with the possibility resting on the intelligence-explosion argument articulated by I. J. Good and developed by Bostrom and Russell.
  4. Capability profiles are jagged, not unidimensional: real systems are superhuman on some tasks, useful on many, surprisingly poor on others, with the boundaries shifting fast, and so the very idea of a single AGI threshold is misleading.
  5. Read AGI claims with awareness of the speakers' incentives: vendors, critics, and governments all have reasons to overstate or understate the prospect, and translating every claim into precise sub-questions about performance, generality, robustness, and autonomy is the cure for being led by labels.

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).