What Is AI?: 1.6   The structure of AI as a field

Dr Chris Paton

1.6 The structure of AI as a field

The AI field has changed shape several times in the past decade. The companies that mattered in 2014, Google Brain, the early DeepMind, a clutch of vision start-ups in San Francisco, are not quite the same companies that matter in 2026. The kinds of work that count as a research contribution have shifted, the bar for entering frontier research has risen by orders of magnitude, and a parallel ecosystem of open-weight models has emerged where none existed five years ago. Knowing who does what, and where the boundaries are, is essential for reading any technical paper, news article, or job listing in AI today. Without a map, the names blur together: a press release about OpenAI sounds like a press release about Anthropic, a Mistral model sounds like a DeepSeek model, and the difference between a frontier lab and a research group at a good university is invisible.

This section takes a snapshot of the field as it stands in early 2026: the institutions, the splits, and the levers of progress.

Research, engineering, and the frontier

The shape of the field is best understood through two splits. They cut across each other, and a given researcher or engineer may sit on either side of either split at different points in their career.

The first split is between frontier work and long-tail work.

Frontier work is the work that produces the next generation of the largest, most capable models. It includes model scaling (training models with more parameters, more data, and more compute than the previous best); alignment (making the resulting models behave in ways that are honest, helpful, and harmless); agent design (giving models tools, memory, and the ability to take actions in the world); and capability evaluation (measuring what the new models can and cannot do). Frontier work happens disproportionately at well-capitalised industrial laboratories, for a simple reason: the cost of training a frontier model is now in the hundreds of millions of dollars, and the cost of operating one at scale is comparable. No university can train GPT-4. A single training run on a frontier model may consume more electricity than a small town does in a month, and the cluster of accelerators required is itself a multi-billion-dollar capital asset.

A few well-funded national initiatives can match the smaller frontier labs: Saudi Arabia's HUMAIN, the UAE's Falcon programme, France's Mistral, and China's various national champions. But the capital intensity is real, and only a handful of organisations worldwide can credibly attempt a frontier training run.

Long-tail work, by contrast, encompasses everything else: algorithmic foundations, theoretical analysis of why certain methods work, applications to specific scientific or industrial domains, robustness studies, pedagogy, the social-science questions about AI deployment, the engineering of efficient inference, the design of evaluation benchmarks, and the slow accumulation of empirical knowledge about how the systems behave. Long-tail work happens across hundreds of academic groups worldwide, in consulting firms, in hospital research units, in government laboratories, in independent research collectives. It does not produce headline-grabbing capability jumps, but it is where most of the field's accumulated knowledge actually lives, and it is what most students of AI will end up doing.

The principal frontier labs in early 2026 are:

United States: OpenAI (the GPT family, through GPT-5.5 in 2026, plus the o-series reasoning models), Anthropic (the Claude family, through Opus 4.7 in April 2026, Constitutional AI research), Google DeepMind (the Gemini family, through 3.1 Pro in February 2026, plus AlphaFold, AlphaProof, AlphaGeometry), Meta (the open-weight Llama family, FAIR research), xAI (the Grok family, founded by Elon Musk in 2023).
China: DeepSeek (DeepSeek-V3 and R1 disrupted the closed-frontier price floor in late 2024), Alibaba (the Qwen family of open-weight models), Baidu (Ernie), Zhipu (the GLM family), Moonshot (Kimi), Tencent (Hunyuan).
Europe: Mistral (open-weight and API hybrid, France), Aleph Alpha (Germany).

Both frontier and long-tail work are essential. Both contribute to the textbook you are reading. The frontier labs have a near-monopoly on the largest models, but the long tail produces the ideas, the people, and the conceptual tools that the frontier labs depend on. Almost every senior researcher at a frontier lab today was trained in an academic group, and almost every algorithmic primitive used at the frontier, attention, residual connections, layer normalisation, mixture-of-experts routing, RLHF, was first published as an academic paper. The separation, however, is a fact of the field's current state and worth being aware of.

The second split is between research and engineering. Research means producing new methods or new understanding. Engineering means deploying methods reliably in production. The boundary is fuzzy. The people who train frontier models are doing both at once; a paper from OpenAI on a new training technique is simultaneously a research contribution and a piece of engineering documentation. But for most students of AI, especially those with industrial careers in view, the engineering side is where most jobs live: building data pipelines, designing prompts, deploying models to production, monitoring drift, integrating AI components into larger systems, and managing the operational burden of running services that depend on probabilistic outputs from imperfect models.

The textbook treats the engineering side from Chapter 16 onwards, and you should not interpret the early-chapter focus on theory as a signal that the theory is more important than the engineering. The opposite is closer to the truth: most economic value created by AI in the next decade will come from engineering, not from new research, and the engineers who understand the underlying methods will produce more reliable systems than those who treat the models as black boxes.

The compute-data-algorithms triangle

A useful first-order model of the field's progress is the compute-data-algorithms triangle. Capability advances come from improvements in all three corners, in proportions that have shifted over time. Each corner has its own dynamics, its own bottlenecks, and its own community of specialists.

Compute has grown enormously. Sevilla et al. (2022) estimate that training compute for the largest models doubled every six months between 2010 and 2022, several times faster than Moore's Law alone would predict. The doubling came partly from larger and faster chips (the move from CPU to GPU to TPU, and from one GPU generation to the next) and partly from larger clusters, growing from single GPUs in 2010 to clusters of $10^4$ to $10^5$ accelerators by 2024, and reaching $10^5$ to $10^6$ at the very largest scales by 2026. Much of the visible progress between AlexNet and GPT-4 reduces, on close reading, to applying more compute to existing or modestly-extended algorithms. The Transformer architecture itself, which dominates modern AI, was published in 2017 and has not been fundamentally replaced; everything since is, in essence, the same architecture trained at larger and larger scale.

Data has also grown, but with limits. The CommonCrawl-based pre-training corpora used to train modern language models are estimated at about $10^{13}$ tokens of high-quality web text. Give or take an order of magnitude, this is the size of the high-quality publicly-available text on the internet. There is no second internet to crawl, and the quality of a token from a discussion forum is not the same as the quality of a token from a peer-reviewed paper. The Chinchilla scaling laws (Hoffmann et al., 2022) showed that data, not parameters, was the binding constraint for many post-2020 models. It had been widely assumed that larger models simply needed more parameters, but Chinchilla demonstrated that the existing models were under-trained on the data they had. This triggered a great rush to filter, deduplicate, and synthesise more training data, and the rise of synthetic data generated by the models themselves under verification. Whether truly novel data will be the ultimate constraint, and whether synthetic data can substitute for it in the long run, is one of the live research questions.

Algorithms have improved more steadily but less dramatically. The Transformer architecture, the recipe of pre-training plus fine-tuning, the attention mechanism, RLHF, mixture-of-experts (MoE) routing, and recent reasoning-RL methods are all genuine algorithmic contributions. Each of them required years of work and produced clear gains over the previous best. But the marginal effect of any single algorithmic change is, on the largest models, typically in the 5-30% range, where the marginal effect of doubling compute is reliably 10-20%. Compute and data are the dominant levers; algorithms are the multiplier. A clever algorithmic idea that costs nothing in compute will give a few percent of capability; doubling the cluster size will give substantially more. We shall come back to this asymmetry in Chapter 15.

The triangle has a second-order property worth flagging: each corner has its own community. Hardware engineers and chip designers work on compute. Data engineers, librarians, and increasingly model-trained data filtering pipelines work on data. Algorithms researchers work on algorithms. The three communities meet at the frontier labs, where a single training run requires all three to be aligned, but in the long tail they are largely separate. A paper at NeurIPS on a new attention variant is unlikely to discuss the cluster topology required to train the model that uses it, and a paper at MLSys on cluster topology is unlikely to discuss the loss function. This is a structural feature of the field, and it explains why progress sometimes feels unevenly distributed, improvements in one corner can sit unused for years until the other two corners catch up.

Open vs closed: the openness axis

A new structural axis in the field is the openness of model weights. The dominant frontier labs in the United States, OpenAI, Anthropic, Google DeepMind, release their strongest models only as APIs, citing safety, commercial, and competitive considerations. You can use GPT-5 or Claude or Gemini, but you cannot download them; you call them through a network endpoint, pay per token, and accept that your inputs and outputs pass through the lab's servers.

Meta, by contrast, has released its Llama family of models (Llama 1, Llama 2, Llama 3, Llama 3.1) under permissive licences. The weights are downloadable; with sufficient hardware you can run them on your own machine, fine-tune them on your own data, and deploy them without any external service. Mistral has done likewise for its smaller models, and the Chinese labs, DeepSeek, Alibaba's Qwen, Zhipu, released open-weight models that, by mid-2025, were within a small margin of the closed frontier on many tasks. DeepSeek-V3 (December 2024) and DeepSeek-R1 (January 2025) in particular pushed the closed labs to compete on price and inference cost; for the first time, an open-weight model from a relatively small Chinese laboratory was competitive with the largest US closed models, at a small fraction of the inference price.

Why open weights matter:

Academic research that would otherwise be impossible: mechanistic interpretability work, robustness studies, systematic ablations, and any experiment that requires modifying the model rather than calling it. You cannot probe the internal activations of a model you can only access through an API.
On-premise deployment in clinical, military, and classified settings, where API calls to remote services are unacceptable for regulatory or operational reasons. A hospital cannot send patient data to a US-based commercial API; an open-weight model running on a local server is a workable alternative.
A long tail of fine-tuned variants for specific languages, domains, and tasks. The Llama family has spawned thousands of community fine-tunes for everything from minority languages to legal-document analysis to specific scientific subfields.
Reproducibility: open weights mean third parties can verify performance claims, run the same evaluations, and identify training-data contamination. The credibility of the field depends on this kind of independent checking.

Open weights also raise legitimate dual-use questions: once weights are released, capabilities cannot be revoked. A model that was acceptable in 2024 may, after a community fine-tune, become a tool for bio-weapons assistance, mass surveillance, or coordinated manipulation campaigns. The closed labs cite these risks to justify API-only release; their critics argue that closed weights primarily protect commercial moats while the marginal safety benefit is modest. The argument is not settled.

The student of AI in 2026 should expect both ecosystems to coexist. Much of the work in this textbook can be reproduced on open-weight models running on a single workstation; some of it (training a frontier model from scratch) cannot. A working researcher in the long tail will spend most of their time with open weights, occasionally calling closed APIs for benchmarking or for tasks that the open models cannot yet handle. A working engineer at a frontier lab will spend most of their time with closed weights, their employer's, and use open weights mainly as a comparison point.

Compute hardware: who makes what

NVIDIA dominates AI accelerators. The H100 (released 2023) was the workhorse for frontier training during the GPT-4 era, and the B200 / Blackwell generation (2024-2025) is the next step. Google designs its own TPUs, used internally for Gemini training and offered through Google Cloud. AWS designs Trainium and Inferentia for training and inference respectively. AMD has the MI300 series, gaining traction as a second source. A cluster of specialist start-ups, Cerebras (wafer-scale chips), Graphcore (UK, IPU), SambaNova, Groq (very low latency inference), offer architectures targeted at specific workloads. China's Huawei Ascend is the dominant domestic option after US export controls cut off Chinese access to the highest-end NVIDIA parts.

Network fabric matters as much as the chips. Training a model on tens of thousands of accelerators requires moving gradients and activations between them at very high bandwidth and very low latency. NVLink and NVSwitch (NVIDIA proprietary) connect GPUs within a node; InfiniBand (originally Mellanox, now part of NVIDIA) connects nodes within a cluster; emerging Ethernet-based fabrics aim to do the same at lower cost. The cluster as a whole behaves as a single coherent training machine only if the fabric does its job; a slow link anywhere in the topology becomes the bottleneck for the whole run.

On the software side, CUDA is the dominant accelerator API; almost everything written for NVIDIA GPUs goes through it. PyTorch is the dominant framework, with the largest user base and the most third-party libraries. JAX (Google) is competitive in research, particularly within Google itself and at academic groups working on theoretical or scaling questions. TensorFlow is in maintenance mode; new projects rarely choose it. The choice of stack affects portability and performance: a model trained in PyTorch on NVIDIA hardware will, with some effort, run on AMD hardware via ROCm; a model written in CUDA-specific kernels may not.

The role of universities, governments, NGOs

Universities still produce the bulk of the people who staff frontier labs. A typical senior researcher at OpenAI, Anthropic, or Google DeepMind has a PhD from a strong department, Stanford, MIT, Carnegie Mellon, Berkeley, Cambridge, ETH Zurich, Toronto, Tsinghua, and spent several years on academic research before being recruited. The labs have largely outsourced their training pipeline to the universities. Universities also produce the foundational theoretical and methodological work that the labs build on; the original Transformer paper had Google authors but the academic community supplied the lineage of ideas (attention, sequence-to-sequence, neural machine translation) on which it depended.

Government bodies are increasingly active. The UK AI Security Institute (founded as the AI Safety Institute in 2023, renamed in February 2025) and its US counterpart the US Center for AI Standards and Innovation (CAISI, founded as the AI Safety Institute and renamed in June 2025) evaluate frontier models for dangerous capabilities, and the EU AI Office is responsible for implementing the EU AI Act, the first major piece of cross-jurisdictional AI regulation. National research-funding agencies, the NSF in the US, UKRI in Britain, ANR in France, NSFC in China, fund a substantial fraction of academic AI research, and the proportion of funding flowing into safety, alignment, and evaluation has risen sharply since 2022.

NGOs and independent research organisations occupy a third niche: METR, Apollo Research, and Redwood Research focus on safety evaluation and dangerous-capability testing; MIRI has worked on theoretical alignment for longer than any of them. Foundation-funded efforts, Open Philanthropy, the Macroscopic programme, Conjecture, fund alignment research outside the frontier labs, on the view that the labs themselves cannot be the sole evaluators of their own models.

The picture: AI in 2026 is no longer purely a computer-science discipline. It is also a regulatory question (what laws should govern model deployment?), a national-security question (which countries' labs should have access to which capabilities?), and an economic-policy question (how should the gains from AI be distributed?). A complete education in AI in 2026 includes some exposure to all three, and the engineer who understands only the technical layer will be poorly placed to navigate the field over the coming decade.

What you should take away

The AI field in 2026 is split into a small frontier (a dozen or so heavily-capitalised labs across the US, China, and Europe, OpenAI, Anthropic, Google DeepMind, Meta, xAI, DeepSeek, Mistral, and a handful of others) and a large long tail (hundreds of academic groups, engineering teams, and independent researchers); both are essential, but they do different kinds of work.
Capability progress comes from compute, data, and algorithms in roughly that order of magnitude. Compute scaling has been the dominant driver since 2018; algorithmic improvements are the multiplier, not the engine.
The openness axis, closed-weight API-only models versus open-weight downloadable models, is a structural feature of the field, not a temporary state. Both ecosystems will coexist; you should expect to work with both.
NVIDIA hardware and CUDA-plus-PyTorch software dominate, but the ecosystem is broader than any one stack, and the choice of stack has real consequences for portability and cost.
AI in 2026 is a regulated, geopolitical, and economically consequential field, not a purely technical one; understanding its structure means understanding more than the algorithms.