Modern AI: 15.16   The frontier as of April 2026

Dr Chris Paton

15.16 The frontier as of April 2026

A snapshot of the leading-edge systems and their distinguishing features. By the time you read this many of these will have been superseded, but the design principles tend to be sticky. Section 15.15 examined how vision, audio and text have been folded into a single token stream; this section takes a step back and surveys the laboratories actually shipping those models. Section 15.17 will then ask the harder question of what "open" means when weights, training data, and evaluations are all separable. Here we are concerned with who is building, what they have shipped, and where the gaps between them sit.

The texture of the frontier in early 2026 is unusual. A dozen organisations are credibly within a few months of one another on standard benchmarks; the gap between the very top and the strongest open-weight system is measured in weeks rather than years; price-per-token has fallen by roughly two orders of magnitude since GPT-4's launch in March 2023; and the geographic distribution of frontier work, which was once almost wholly American, now runs across at least three continents. The field feels crowded in a way it has not since the Cambrian moment of 2017–2018, but the frontier itself has narrowed: every leading system is a transformer with mixture-of-experts routing, extended-thinking traces, native multimodality, and a tool-use scaffold. Differentiation lives in the edges.

OpenAI

OpenAI remains the laboratory whose products defined what the public means by "AI". The lineage runs from GPT-3 (2020) through GPT-4 (March 2023), GPT-4o (May 2024) and the o-series of reasoning models, o1 (September 2024), o3 (December 2024) and o4 (mid-2025), which introduced thinking-token traces as a deployed product feature. GPT-5, released in 2025, fused the GPT and o lineages into a single adaptive model: it decides per query how long to think, drawing on an internal classifier of problem difficulty. GPT-5.1, GPT-5.2 (December 2025), and GPT-5.5 / GPT-5.5 Pro (early 2026) are the current OpenAI flagships and lead the FrontierMath, ARC-AGI, and SWE-Bench leaderboards.

Distinctive features. OpenAI shipped the first widely deployed reasoning model and continues to set the tempo on the o-series. ChatGPT, the consumer product, has well over half a billion weekly active users by 2026 and remains the broadest distribution channel for any generative system. Sora, the video model, brought minute-long generation into the mainstream. Operators, the agentic browsing layer added in 2025, packaged tool use and computer use into a consumer-shaped product. The API ladder runs from cheap Haiku-class models through GPT-5 thinking-mode endpoints, and the developer ecosystem around the OpenAI SDK is the largest in the field.

Pricing in early 2026. ChatGPT Free retains rate-limited GPT-5 access; ChatGPT Plus is around twenty US dollars a month with priority access and longer thinking budgets; ChatGPT Pro is around two hundred dollars a month with extended Sora generation and unlimited reasoning. API tiers span roughly one hundred-fold from the smallest to the largest model on a per-token basis.

Reasoning is OpenAI's signature. The recipe, reinforcement learning with verifiable rewards on chains of thought, augmented by process-reward search at inference time, is the one we described in Section 15.7. OpenAI has not published the full procedure, but enough has been pieced together by external groups that the method is now an open-source default. Where they remain ahead is in the integration: thinking budgets, tool calls, retrieval and computer use sit inside one weight matrix rather than in a stitched scaffold. Microsoft's Azure partnership remains the dominant cloud-distribution channel, and the fact that GPT-5 ships inside Microsoft 365 Copilot puts OpenAI's models in front of more knowledge workers each day than any other system.

Anthropic

Anthropic was founded in 2021 by alumni of OpenAI's GPT-3 effort, and its Claude family, currently in its fourth generation, has steadily climbed from polite challenger to frequent benchmark leader. The Claude 4 family launched in May 2025 with three tiers: Sonnet (workhorse, 200K context, native tool use), Opus (frontier reasoning) and Haiku (small, fast, near-frontier on standard benchmarks). Claude Haiku 4.5 in October 2025 collapsed much of the distance between the small and the large tiers, putting genuine reasoning into a model cheap enough for high-throughput agentic workflows. Claude Sonnet 4.5 (September 2025), Haiku 4.5 (October 2025), Opus 4.5 (November 2025), Opus 4.6, and Opus 4.7 (16 April 2026; 1M-token context at standard pricing) ship with a one-million-token context option, extended thinking by default, and full computer use.

Distinctive features. The signature architectural choice is extended thinking, the model decides per query how long to think and reports its trace transparently. Constitutional AI, described in Section 15.12, remains Anthropic's alignment recipe and is now augmented with deliberative alignment, a layer that has the model reason explicitly about whether a response complies with its policies before emitting it. Among practitioners, Claude is most often praised for code (consistently the SWE-Bench leader, with Claude Opus 4.7 around 82 per cent on SWE-Bench Verified, GPT-5.5 around 83 per cent, Gemini 3.1 Pro around 79 per cent (April 2026 leaderboard); preview models reach the high eighties to low nineties), long-form writing, and what users describe vaguely but consistently as "vibes", a steadiness of voice that survives long contexts.

Pricing in 2026 mirrors OpenAI's: Claude Pro around twenty US dollars a month, Claude Max several hundred, API per-token rates that vary roughly fifty-fold across the tier ladder. The Anthropic developer surface also exposes the longest-running formal alignment programme in industry, which matters for regulated buyers in healthcare, finance, and government.

Google DeepMind

Google's combined research arm, DeepMind merged with Google Brain in 2023, fields the Gemini family. Gemini 1.5 Pro (February 2024) was the first frontier model with one-million-plus token context. Gemini 2 (December 2024) and Gemini 2.5 (early 2025) tightened the multimodal recipe and integrated more deeply with Google's product surface. Gemini 3 Pro launched 18 November 2025; Gemini 3.1 Pro (released 19 February 2026) is the actual flagship, with a 1M-token context window (a 2M tier remains in preview), shipping with a Deep Think mode and end-to-end native multimodality across text, vision, audio and video.

Distinctive features. The Google integration is unrivalled: Gemini sits inside Search, Workspace, Android, YouTube transcription, and the Pixel device line. For an Android user with a Workspace account, Gemini is a default rather than an app. Beyond the consumer product, DeepMind continues to ship domain-specialised systems built on Gemini foundations. AlphaProof and AlphaGeometry 2 (mid-2024) together solved four of six problems at the 2024 International Mathematical Olympiad at silver-medal level, the first frontier-scale demonstration of formal mathematical reasoning. AlphaFold 3 (May 2024) extended structure prediction from proteins to nucleic acids, ligands and ions. Genie 2 and Genie 3 are interactive world models.

Pricing tracks the field, but the Google bundle changes the calculus. A Workspace subscriber gets Gemini access folded into a wider productivity suite, which makes per-token comparisons misleading. The API surface, served through Vertex AI and AI Studio, is the most enterprise-shaped of the major labs, and TPUs, Google's in-house accelerators, give DeepMind a hardware-software co-design advantage that no competitor can fully match without leasing the same silicon back from Google Cloud.

xAI

xAI, founded by Elon Musk in 2023, has moved with unusual speed. Grok 1 (November 2023) was unremarkable; Grok 2 closed much of the gap in 2024; Grok 3 (early 2025) was credibly frontier-class on selected benchmarks; Grok 4 (late 2025) is the current flagship. The Memphis Colossus cluster, reportedly the largest single training cluster in the world by node count when it came online in 2024, gave xAI raw compute parity with the older labs in a fraction of their existence. Grok ships inside the X platform and as a standalone product.

Distinctive features. Distinct positioning on safety and tone, Grok is marketed as less filtered and more contrarian than its competitors, and a tighter integration with real-time social-feed data than any other frontier system. Iteration cadence is fast and the team's appetite for compute is conspicuous; the willingness to ship a model with a rougher safety surface is itself a market position, and one that some enterprise buyers explicitly seek out for research and red-teaming work. Whether xAI sustains its current trajectory depends in large part on whether the Memphis cluster scales as advertised and whether the X distribution channel proves to be a genuine moat rather than a captive audience.

Chinese frontier

The Chinese frontier in 2026 is plural and serious. DeepSeek-V3 (December 2024) and DeepSeek-R1 (January 2025) reset open-weight expectations: V3 was a 671-billion-parameter MoE with 37 billion active, trained for a reported 5.6 million US dollars on FP8 mixed precision; R1 added the GRPO + verifiable-reward reasoning recipe and was released openly with a full methods paper. The "DeepSeek moment" of January 2025, when the cost-efficiency gap became public, briefly moved markets and forced a reappraisal of the assumed cost floor for frontier training. DeepSeek-V3.1 (August 2025), V3.2 (December 2025), and V3.2-Speciale (December 2025) followed; R2 was repeatedly delayed (Huawei Ascend training issues) and as of late February 2026 had not officially shipped.

Alibaba's Qwen line is the other Chinese pillar. Qwen2.5 (late 2024) and Qwen3 (2025) are open-weight and span dense models from half a billion to seventy-two billion parameters, plus Qwen-VL for vision and Qwen-Coder for code. Qwen has become the default base model for fine-tuners in Asia and increasingly in Europe.

Beyond DeepSeek and Qwen the field includes Baidu Ernie 4.5, Zhipu's GLM-4 series, Moonshot's Kimi (notable for early aggressive context-window scaling), Tencent's Hunyuan and 01.AI's Yi family. Several of these match closed Western models on Chinese-language benchmarks and many are competitive on English ones. The composite picture is of a frontier roughly six to nine months behind the very best closed Western labs on the hardest reasoning tasks, level on most everyday workloads, and substantially ahead on price.

European frontier

Europe runs at a smaller scale. Mistral (Paris, founded 2023) is the strongest player, with Mistral Large, the Mixtral MoE family, and the Codestral code model; its open-weight releases have been an important corrective to a market dominated by US and Chinese labs. Aleph Alpha (Heidelberg) targets European-language and regulated-sector deployments. Stability AI continues in the image space. Beyond these, the European story is mostly about academic groups, sovereign-cloud partnerships, and the regulatory environment, the EU AI Act, in force from 2025, shapes deployment more than it does training.

Europe is several billion dollars of compute behind the US and Chinese leaders, and the gap is widening at the very top, though Mistral's open-weight cadence keeps it relevant in the broader ecosystem. The continent's strategic bet, in so far as it has one, is that regulation, sovereignty requirements and a deep base of public-sector and healthcare buyers will create a durable demand for European-built and European-hosted systems even if those systems trail the absolute frontier by a generation.

Differentiation

By 2026 the leading models are similar enough on aggregate benchmarks that comparison shopping happens on edges rather than averages. Five axes matter most.

Reasoning depth. The o-series, Claude Opus with extended thinking, Gemini Deep Think, and DeepSeek-R1/R2 all support long chains of thought; differences lie in trace quality, calibration, and whether the model knows when to stop thinking.

Native multimodality. GPT-5, Gemini 3 and Claude 4 are end-to-end multimodal across text, vision and audio; video generation is shipped separately (Sora, Veo, Genie). Latency on real-time audio still varies by an order of magnitude across vendors.

Tool use and agency. Claude's computer use is the most polished consumer-shaped agentic surface; OpenAI Operators and Gemini's tool-call layer trail by months rather than years. The Model Context Protocol (MCP), introduced by Anthropic in late 2024 and widely adopted through 2025, standardises how tools advertise themselves to models.

Code. Claude leads the SWE-Bench leaderboard most months, with Cursor's tab-model and Anysphere's specialised coding models close behind on certain workflows.

Cost. Per-token prices vary roughly one-hundred-fold across the tier ladder. Haiku-class models at sub-dollar-per-million-token rates make agentic loops affordable; frontier reasoning models at twenty to two hundred dollars per million output tokens make individual prompts strategic decisions rather than throwaway calls.

What you should take away

The frontier is crowded but narrow: a dozen organisations are within a season of one another, all building on transformers with MoE routing, extended-thinking traces, native multimodality and tool-use scaffolds.
Open weights now sit weeks behind closed weights, not years; DeepSeek-R1 and Qwen3 changed the default assumption that frontier capability requires closed weights.
Price-per-token has fallen roughly one hundred-fold since GPT-4 launched in 2023, and tier laddering, Haiku-class, Sonnet-class, Opus-class, is the new shape of the API surface.
Geographic distribution has widened: US, China and Europe each have at least one credible frontier programme, and the cost-efficiency gap demonstrated by DeepSeek has reset assumptions about what a single-digit-million-dollar training run can produce.
Differentiation lives at the edges, reasoning depth, agency, code quality, multimodal latency, and price, not in headline benchmark averages, which converge faster than the underlying capabilities diverge.

Textbook of AI