17.1 The application landscape in 2026
Chapter 16 covered the safety and alignment questions that hang over the technology in the abstract. Chapter 17 grounds those abstractions in practice, what AI is being used for today, in which sectors, and to what effect. The book ends here because every other thread in the previous sixteen chapters terminates in some application, and the state of the art in 2026 is best summarised by walking through those terminations one at a time. We close with a short list of takeaways that ties this chapter back to the foundations laid earlier.
Healthcare
Medicine was the first field outside computer science to have its working day visibly altered by deep learning, and it remains the most carefully studied application area. Three classes of tool now operate in routine clinical workflows.
The first is medical imaging. Esteva and colleagues' 2017 demonstration that a fine-tuned Inception-v3 network matched twenty-one board-certified dermatologists on melanoma versus benign nevus, and Gulshan and colleagues' 2016 retinal fundus result with an area under the ROC curve of 0.991 for referable diabetic retinopathy, established a template that hundreds of subsequent papers followed. By the end of 2025 the FDA had cleared 1,451 AI-enabled medical devices (295 added in 2025 alone), the great majority through the 510(k) substantial-equivalence pathway. The most thoroughly validated deployment is breast screening: the MASAI trial reported in Lancet Oncology in 2023 randomised eighty thousand Swedish women to AI-supported single-reader mammography against conventional double-reader screening and found equivalent cancer detection with a forty-four per cent reduction in radiologist workload. Diabetic retinopathy screening with IDx-DR runs autonomously in primary care; large-vessel-occlusion stroke triage with Viz LVO routes patients to thrombectomy faster; nnU-Net has won more than a hundred segmentation challenges and is the default baseline. Pathology lags imaging by about five years but is following the same arc, with FDA-cleared prostate cancer detection from Paige and active deployments at Memorial Sloan Kettering and Mass General Brigham.
The second class is structural biology and drug discovery. AlphaFold 2, reported in Nature in 2021, achieved a median GDT of 92.4 on the hardest CASP14 free-modelling targets and was widely described as having essentially solved single-domain protein structure prediction. AlphaFold 3 generalised the architecture to complexes including nucleic acids, ligands and post-translational modifications. The AlphaFold Protein Structure Database now contains predictions for over two hundred million proteins. Beyond folding, RFDiffusion from the Baker laboratory generates novel protein backbones for desired binding interfaces, ProteinMPNN designs sequences to fit those backbones, and ESM-2 provides residue-level embeddings that double as fast structure predictors. David Baker shared the 2024 Nobel Prize in Chemistry with John Jumper and Demis Hassabis for this body of work. Insilico Medicine's INS018-055, an idiopathic pulmonary fibrosis candidate generated in part by their AI platform, entered Phase II trials in 2023, the first AI-designed drug to do so. Phase IIa results reported in 2024-25 showed safety and tolerability.
The third class is clinical reasoning and documentation. Med-PaLM 2 reached 86.5% on the MedQA US Medical Licensing Examination benchmark in 2023, and frontier models routinely score above ninety per cent on USMLE-style questions. Benchmark performance, however, overstates real-world utility, and randomised studies of LLM-assisted differential diagnosis have produced mixed results: Goh and colleagues' JAMA Network Open paper in 2024 found that physicians using GPT-4 did not significantly outperform controls, although GPT-4 alone outperformed both. The clearer wins have been administrative. Ambient AI scribes (Abridge, Nuance DAX, Suki, Nabla) reduce physician documentation time by thirty to seventy minutes per day in deployment cohorts; Permanente, Mass General Brigham and the NHS have all conducted large rollouts. Patient-message triage drafts replies for clinician review; coding and billing automation produces a clear short-payback return on investment.
The standing limitations are well known: distribution shift across hospitals, demographic bias in training data, and the gap between benchmark accuracy and prospective patient benefit. The deployments that have worked have augmented clinicians rather than replaced them, and have passed through the same evidentiary gates as any other clinical intervention.
Science
The natural sciences are the field where AI's claim to have changed the conduct of research is strongest. Four lines of evidence anchor the claim.
Structural biology. AlphaFold 3 (Abramson and colleagues, Nature 2024) extended the AlphaFold 2 framework from single-chain proteins to biomolecular complexes including DNA, RNA, ligands, ions and covalent modifications. Ligand pose prediction on the PoseBusters benchmark was substantially better than dedicated docking tools. The ESM Metagenomic Atlas added more than six hundred million predicted structures from sequences that had no experimentally determined homologues, opening up the so-called dark proteome to systematic study.
Weather forecasting. GraphCast, published by DeepMind in Science in 2023, is a graph neural network that produces ten-day global weather forecasts at 0.25-degree resolution in under a minute on a single TPU. It outperforms the operational gold-standard, the European Centre for Medium-Range Weather Forecasts (ECMWF) HRES system, on roughly ninety per cent of verification targets. Pangu-Weather from Huawei achieved similar results with a vision-transformer architecture. By 2026 ECMWF, the Met Office and the US National Weather Service had all incorporated machine-learning-based nowcasting into their operational pipelines.
Materials discovery. GNoME (Graph Networks for Materials Exploration), reported by Merchant and colleagues in Nature in 2023, used active learning over graph neural networks to predict 2.2 million new stable crystal structures, of which about 380,000 were judged sufficiently promising for synthesis. The autonomous A-Lab at Lawrence Berkeley used robotic synthesis routes to attempt fifty-eight of the proposed compounds and successfully produced forty-one. The MACE family of equivariant interatomic potentials brought first-principles accuracy to molecular dynamics simulations at a fraction of the cost of density functional theory.
Mathematics. AlphaProof and AlphaGeometry 2, reported by DeepMind in 2024, achieved silver-medal performance on the International Mathematical Olympiad, solving four of six problems in the formal Lean 4 setting. Independently, large language models have become standard tools for accelerating literature review, generating candidate hypotheses and assisting with proof drafting; the LLM4Science movement formalises these uses. The qualifier worth attaching is that frontier AI mathematical reasoning still consists largely of recombination of training data rather than genuinely novel proof discovery, but the gap between recombination and discovery is itself a topic of active investigation.
Robotics and autonomy
Robotics is the field most often cited as the next frontier and the field where progress has most often disappointed. Two lines have nonetheless produced operational systems.
Self-driving. Waymo operates fully driverless ride-hailing services in Phoenix, San Francisco, Los Angeles and Austin, with millions of passenger trips and per-mile crash rates that compare favourably with human drivers on the same road segments. Tesla's Full Self-Driving suite remains a driver-assistance product rather than an autonomous one, but the underlying end-to-end neural-network policy, trained on hundreds of millions of miles of fleet data, has demonstrated the viability of the imitation-learning approach at scale. Cruise's 2023 California permit suspension, by contrast, illustrates the regulatory and safety brittleness of the sector.
Manipulation. Diffusion policies, introduced by Chi and colleagues in 2023, treat the action sequence of a robot as a sample to be denoised; they have produced state-of-the-art results on dexterous manipulation tasks where earlier reinforcement-learning approaches failed. Large behaviour models, RT-2 from Google DeepMind, the Open X-Embodiment cross-platform dataset, and Physical Intelligence's π0 in 2024, have attempted to do for robotics what GPT did for text, training a single policy across many embodiments and tasks. Generalisation across tasks remains the open problem; generalisation across embodiments is harder still.
Drones, agricultural and industrial robots have moved into operational use in narrower settings: Skydio for autonomous inspection, John Deere's See and Spray for selective herbicide application, Boston Dynamics' Spot for inspection and warehouse work. Embodied AI, the term of art for the union of perception, language and motor control in physical agents, is the headline frontier of 2024–2026 reinforcement-learning research, and most of the leading laboratories, DeepMind, Tesla, Figure, Physical Intelligence, NVIDIA, Toyota Research Institute, have humanoid programmes underway. Deployment in unstructured human environments such as homes is still, on any sober assessment, several years away.
Climate and Earth science
The climate and Earth-science applications have grown out of weather forecasting and materials discovery and increasingly form a coherent applied programme. Forecasting with GraphCast and Pangu-Weather has been discussed; their twin advantage is speed (minutes rather than hours per forecast cycle) and the ability to produce large ensembles cheaply, which feeds directly into uncertainty quantification for downstream decisions. Climate modelling uses machine-learning emulators to replace expensive subgrid parameterisations, and downscaling from global model output to local scale is now routinely done with conditional diffusion or super-resolution networks. Energy demand forecasting is an established commercial application; National Grid in the United Kingdom, ERCOT in Texas and the Australian Energy Market Operator all use ML-based load forecasting in their day-ahead planning. Wildfire and flood prediction has moved from research to operational deployment in California, Australia and Brazil; Google's flood forecasting system covers more than a hundred countries and reaches several hundred million people. Materials discovery for cleaner energy (better battery cathodes, more efficient photocatalysts, room-temperature superconductors) is one of the most active areas: GNoME's 380,000 candidate structures, MIT's MACE potentials and the Open Catalyst Project from Meta have all targeted carbon-reduction technologies.
The caveat here is the same as everywhere else: forecasting and discovery are accelerated; deployment of physical infrastructure follows the timescales of physics, supply chains and politics, none of which AI shortens.
Education and creative
Education. Khan Academy's Khanmigo, launched in 2023 and built on GPT-4, brought one-to-one Socratic tutoring to several million students. Randomised evidence of learning gains is still thin, and most empirical work on AI tutoring measures engagement rather than acquisition. The author's own MathSpark research project, browser-based adaptive maths tutoring with Three.js animations and Bayesian Knowledge Tracing, sits in the same intellectual lineage. The pattern across deployed edtech AI is that it works well as a homework helper and lesson generator, less well as a pedagogical agent that genuinely diagnoses misconceptions.
Code assistants. GitHub Copilot, Cursor and the Claude and ChatGPT chat interfaces have together changed how professional software is written. Peng and colleagues' randomised study at GitHub in 2023 measured a fifty-five per cent productivity uplift on a constrained task; subsequent observational evidence from Microsoft, Google and Stripe has been broadly consistent though with smaller effects on real workloads. By 2026 Copilot has been adopted by tens of millions of developers; Cursor's agent mode, Devin from Cognition, and the SWE-Bench Verified benchmark show that frontier models can resolve roughly half of real GitHub issues unaided.
Creative arts. Image generation with Midjourney, DALL-E 3 and Stable Diffusion has become a routine tool in advertising, illustration and concept design. Music generation with Suno and Udio produces convincing two-minute songs from text prompts, and the legal status of training on copyrighted music is unresolved as of 2026. Video generation with Sora, Veo 2 and Runway Gen-3 has progressed from a few seconds of incoherent footage in 2023 to minute-long clips with controlled cinematography in 2025. Translation through Whisper, NLLB and the major commercial APIs has reached parity with professional human translation on most language pairs that matter economically. Accessibility applications, real-time captioning, image-to-speech for blind users via Be My Eyes' GPT-4 integration, are quietly among the most clearly beneficial deployments of the technology.
Finance, fraud, security
Finance was using machine learning before deep learning was fashionable, and the sector remains a heavy and somewhat understated user. Algorithmic trading in equities, futures and foreign exchange uses a mix of classical statistical methods and modern deep models for short-horizon signal extraction; the high-frequency segment is dominated by latency rather than model sophistication. Fraud detection at the major card networks (Visa, Mastercard) and digital wallets (Stripe Radar, PayPal) uses gradient-boosted trees and sequence models to score every transaction in real time, with false-positive rates an order of magnitude below rules-based predecessors. Anti-money-laundering systems run by the global tier-one banks use graph neural networks to detect coordinated networks of accounts; the regulatory environment under FATF, FinCEN and the EU's Anti-Money Laundering Authority is pushing every major institution to deploy ML-based monitoring. KYC and identity verification through Onfido, Persona and Stripe Identity uses face matching, document analysis and liveness detection.
In cybersecurity, intrusion detection systems and security-information-and-event-management platforms (Splunk, Microsoft Sentinel, CrowdStrike) use ML models to triage events. LLM-based vulnerability discovery has been a 2024–2026 development; Google's Project Naptime found dozens of real vulnerabilities in open-source code, and DARPA's AIxCC competition saw automated systems patch real-world bugs. The reverse, LLMs assisting attackers with phishing, social engineering and malware authoring, is also an active threat surface. The sector is in an arms race, and AI is on both sides.
Where AI hasn't worked
It is worth recording where the technology has failed to deliver on the expectations of its proponents.
Long-horizon agentic tasks (booking a complete trip, running a multi-week research project, operating autonomously over the lifetime of a software product) remain unreliable. SWE-Bench Verified accuracies in the fifty per cent range translate to systems that complete the easier tickets and silently fail on the harder ones; the cost of verifying and repairing the failures often exceeds the labour saved.
Genuinely novel scientific reasoning is largely absent. Frontier models can recombine published ideas with great fluency and can occasionally produce useful syntheses, but examples of an LLM proposing a hypothesis that no human had considered and that subsequently survived experimental test are rare. AlphaFold 2, the strongest counter-example, was a focused engineering achievement on a closed problem rather than open-ended discovery.
Robotics in unstructured human environments (homes, hospitals, schools) is harder than expected. The "Moravec paradox" of the 1980s has not gone away: tasks that humans find effortless (folding laundry, opening unfamiliar doors, operating bathroom taps) remain at the edge of what current systems can do reliably.
Truly personalised education has been promised since the 1960s and has not yet arrived. Most deployed AI tutoring is generic; the systems that adapt to the individual student do so over a narrow range of skills. Adaptive content generation, adaptive difficulty and adaptive scheduling are all available, but the synthesis of those into a tutor that diagnoses a child's misconceptions as a skilled teacher would is still an aspiration.
What you should take away
- AI is now a tool in nearly every domain that touches data. Medicine, biology, weather, materials, transport, finance, education, the arts and security have all incorporated machine-learning systems into operational workflows; the question is no longer whether to use AI but where, how and with what safeguards.
- The wins are concrete and measurable in some places, soft and aspirational in others. Protein structure, weather forecasting, mammography, ambient documentation and code assistance are real. Long-horizon autonomy, novel scientific reasoning, household robotics and truly personalised tutoring remain works in progress.
- Augmentation has consistently outperformed replacement. The successful deployments leave a human in the loop with final accountability; the failures have typically been attempts to remove the human prematurely.
- Distribution shift, dataset bias and prospective evaluation are the recurring practical themes. Benchmark performance is a starting point, not a deployment licence; external validation across sites and populations, with prospective and ideally randomised evidence, is what distinguishes products that improve outcomes from products that look good on paper.
- The framework you have learned in this book is general. Linear algebra, calculus, probability, statistics, classical machine learning, neural networks, training dynamics, convolutional and sequence models, attention, generative models and the safety considerations of Chapter 16 are the tools with which any of the applications above can be understood and, in due course, extended. The applications differ; the foundations do not.