17.2 Drug discovery and biology beyond AlphaFold

If §17.1 swept across the application landscape at altitude, this section drops down into the part of that landscape that has changed most deeply over the last five years: computational biology and drug discovery. The reason for stopping here, rather than at any of the other places we might have stopped, is that the modern wave of artificial intelligence has done more for the working biologist than for the working clinician, the working physicist or the working economist. AlphaFold 2 reset the field of structural biology in 2021 in a way that benchmarks like ImageNet-2012 reset computer vision a decade earlier; AlphaFold 3, RFDiffusion, ESM-2, scGPT, Evo and the MACE family of interatomic potentials together have moved computational biology from a discipline of bespoke pipelines built around individual datasets to one in which a small library of foundation models can be fine-tuned, prompted or queried for specific applications. The cultural change parallels the transformation that natural-language processing underwent in 2018–2020 with the arrival of BERT and GPT, and is, if anything, more far-reaching, because the underlying problems are physical rather than linguistic.

§17.1 treated AlphaFold and protein design as one slice of the AI-in-medicine story (alongside imaging, ambient documentation, EHR analysis, and clinical-decision-support tools such as Med-PaLM 2). Treated on its own terms, computational biology has been transformed more deeply and more broadly than any of those neighbouring areas. We now look at the five lines along which that transformation has happened, protein design, machine-learning interatomic potentials, single-cell transcriptomics, genomics and antibody engineering, and at the cross-cutting questions of evaluation, deployment and risk that these advances raise.

De novo protein design

The Baker laboratory at the University of Washington has, over the last five years, made it routine to design proteins for tasks that previously required either evolutionary borrowing or laboratory-scale directed evolution. The trick is to compose two complementary tools. RFDiffusion, introduced by Watson and colleagues in Nature in 2023, treats the protein backbone as a sample to be denoised; trained on the Protein Data Bank, it can generate three-dimensional backbones conditioned on a target binding interface, a desired symmetry, or a fragment of an existing structure. ProteinMPNN then designs a sequence of amino acids that will fold into that backbone, reversing the AlphaFold problem. Together they make a controllable generative pipeline: specify what the protein should bind to, draw a backbone that fits the binding interface, fill in the sequence, and order the gene for synthesis.

The 2023 Nature paper from the Baker group described the design and experimental validation of high-affinity binders to twelve different targets in a workflow that took weeks rather than years. Cytokine mimics, receptor binders, symmetric assemblies and mini-proteins targeting viral surface proteins are now routine teaching projects rather than career-defining achievements. Several biotechnology startups have been founded explicitly around generative protein design, Generate Biomedicines, Cradle and Xaira Therapeutics, the last of which raised one billion US dollars in a Series A in 2024 on the strength of the underlying methods rather than an existing pipeline of validated molecules, which gives a sense of how venture capital currently prices the approach. David Baker shared the 2024 Nobel Prize in Chemistry with John Jumper and Demis Hassabis, recognising the combined contribution of folding (AlphaFold) and design (RFDiffusion / ProteinMPNN). Insilico Medicine's INS018-055, an idiopathic pulmonary fibrosis candidate generated in part by their AI platform, entered Phase II trials in 2023, the first AI-designed drug to do so. Phase IIa results reported in 2024-25 showed safety and tolerability, and it is the most often-cited reminder that the laboratory pipeline still moves on its own time, regardless of how fast the upstream computational steps have become.

Machine-learning interatomic potentials

Classical molecular dynamics simulates atomic motion using empirical force fields such as AMBER, CHARMM and OPLS, fast but inaccurate, or quantum-chemistry methods such as density functional theory (DFT), accurate but limited to small systems for short times. Machine-learning interatomic potentials, neural networks trained to reproduce DFT energies and forces, promise to combine accuracy with speed. MACE, introduced by Batatia and colleagues at NeurIPS in 2022, is an equivariant message-passing network whose features transform predictably under rotations and translations of the input molecule; trained on small DFT datasets, it transfers to larger systems and longer timescales. NequIP (Batzner and colleagues, 2022) and Allegro (Musaelian and colleagues, 2023) sit in the same architectural family. By 2026, MACE-MP-0, a foundation MACE model trained on the Materials Project, can simulate organic, inorganic and biomolecular systems with near-DFT accuracy at four to five orders of magnitude greater speed than DFT itself. Microsoft's MatterSim (2024) extends the same idea to inorganic materials.

The cumulative effect on materials science and computational chemistry is substantial. Ab initio molecular dynamics, previously limited to perhaps a hundred atoms for picoseconds, now scales to tens of thousands of atoms for nanoseconds with foundation-model-driven potentials. The same technology underwrites the autonomous discovery efforts described in §17.5, GNoME's 2.2 million predicted stable crystal structures and the A-Lab's robotic synthesis of forty-one of them, because the screening step, which used to be the bottleneck, is now the cheap step. For drug discovery the implication is that binding-energy estimates, free-energy perturbation calculations and conformational sampling, which together used to consume the bulk of a medicinal chemistry programme's compute budget, now run at a fraction of the previous cost, and the results are accurate enough to drive go/no-go decisions on candidate molecules without recourse to a small army of computational chemists.

Single-cell transcriptomics

Single-cell RNA-sequencing produces gene-expression profiles for millions of individual cells, opening a window onto cellular heterogeneity that bulk sequencing closes. Foundation models for single-cell RNA sequencing have begun to do for this data what the BERT family did for natural language. Geneformer (Theodoris and colleagues, Nature 2023, trained on thirty million cells) and scGPT (Cui and colleagues, Nature Methods 2024, trained on thirty-three million cells) tokenise the gene-expression vector for each cell, train transformers on it, and produce cell-state embeddings useful for batch correction across experiments, perturbation prediction, gene-network inference and cell-type annotation. The promise, not yet fully realised, but plausibly within reach, is foundation models that can answer questions like "what would happen to this cell if I knock out this gene?" without having to perform the experiment. Validation against held-out perturbation screens has been encouraging on cell-type changes; transferability across tissues and species remains the open empirical question. The deeper architectural question, whether the right tokenisation of an expression vector is by ranked-gene order (Geneformer) or by binned expression level (scGPT) or by something else not yet proposed, is still open, and the answer will likely shape the next generation of cell-state foundation models.

Genomics

Sequence-based genomics models attempt to predict molecular phenotypes directly from DNA. Enformer (Avsec and colleagues, Nature Methods 2021) used a transformer with a 200,000 base-pair receptive field to predict gene expression from sequence at substantially higher accuracy than previous convolutional baselines, and is now a standard tool in functional genomics. Its long context window allowed it to learn the kind of distal regulatory interactions, promoters and enhancers separated by tens of thousands of bases, that earlier convolutional models, with effective receptive fields of a few thousand bases, simply could not represent. The Evo family of DNA language models (Nguyen and colleagues, Science 2024 for Evo-1; the forty-billion-parameter Evo-2 later in the same year) trained transformers on eighty thousand microbial genomes and demonstrated zero-shot generation of plausible CRISPR systems and other genetic elements. The qualifier here is the same as for any large generative model: outputs can be made to look like training-distribution exemplars without that being evidence of biological function. The Evo papers acknowledge this and report wet-lab validation experiments alongside the in-silico ones, which is the right standard. The further qualifier specific to genomics is that prediction of expression is not causation: a model that predicts that a single-nucleotide variant changes expression is not yet a model that explains why, and the gap between correlative and mechanistic models is the active frontier.

Antibody design

Therapeutic antibody discovery has historically used phage display, immunisation of transgenic animals or screening of natural human repertoires. Each of these methods is an exercise in evolutionary search, with the laboratory standing in for natural selection. Generative antibody-design models, IgLM (Shuai and colleagues, 2022), AbLang (Olsen and colleagues, 2022), and the Absci platform, have begun to enter pharmaceutical workflows by replacing the search step with a learned prior over plausible antibody sequences. Absci announced in 2023 that its de novo antibody-design platform had produced a candidate against HER2 at affinities competitive with directed-evolution outputs. Combining antibody language models with structure prediction (AlphaFold-Multimer, IgFold) and binder-design diffusion (RFAntibody from the Baker lab, 2024) is the leading research direction in 2026; the early results suggest that what was a dedicated experimental pipeline can become a substantially smaller experimental loop wrapped around a larger computational one. The economic implications, if the trajectory continues, are significant: antibody discovery is one of the highest-margin activities in the pharmaceutical industry, and a method that compresses its timeline by an order of magnitude reshapes the cost structure of biologics development.

Risks and open questions

The risks here divide into the familiar and the field-specific. The familiar set includes dataset bias (the Protein Data Bank over-represents soluble globular proteins, so generative models for membrane proteins inherit the gap), distribution shift (a perturbation model trained on cancer cell lines may say little about primary neurons), and over-trust (a polished AlphaFold prediction can be wrong in exactly the loop that matters for drug binding, and the structure displayed in the viewer gives no confidence interval that a non-specialist can read). The field-specific set includes biosecurity: a generative model that designs novel proteins is, in principle, a model that can design novel toxins or receptors for engineered pathogens. The field has begun to take this seriously, the 2023 White House executive order on AI singled out biological design tools, and consortia such as IBBIS coordinate screening of synthesised DNA against worrisome sequences, but the technical balance between openness and risk is unresolved.

The regulatory pathway is, like everything in pharmaceuticals, slow but maturing. The US Food and Drug Administration treats AI-derived molecules as molecules, not as algorithms: an AI-designed drug goes through the same investigational new drug, Phase I, II and III evaluation as any other compound. AI-derived diagnostic claims fall under the Software as a Medical Device pathway, which by 2026 has cleared more than a thousand devices, mostly through the 510(k) substantial-equivalence route. The bottleneck for AI-discovered medicines is not the regulator; it is biology, supply chains and patients, none of which AI shortens.

What you should take away

  1. Computational biology has been changed more by the modern AI wave than any other applied science. AlphaFold, RFDiffusion, ESM-2, MACE, scGPT and Evo together constitute a foundation-model stack for biology that did not exist five years ago.
  2. The pipeline is now folding, designing, simulating and predicting: all model-driven. Folding (AlphaFold), design (RFDiffusion / ProteinMPNN), molecular dynamics (MACE family), single-cell modelling (Geneformer, scGPT) and DNA-sequence modelling (Enformer, Evo) compose into an integrated computational stack.
  3. The acceleration is on the upstream steps; the downstream remains slow. Computation has compressed from years to weeks; the laboratory, the clinic and the regulator still move on their own timescales. Insilico Medicine's INS018-055 is the canonical reminder.
  4. Validation discipline is what separates real progress from theatre. A predicted structure, a designed binder, a generated CRISPR system or a perturbation forecast is a hypothesis, not a finding. Wet-lab confirmation, prospective evaluation and replication across laboratories are what convert computational output into scientific knowledge.
  5. The risks are real but tractable. Dataset bias, distribution shift, over-trust and biosecurity each demand a specific technical and governance response. The reasonable position in 2026 is neither to celebrate nor to fear these tools wholesale, but to deploy them with the same evidentiary rigour the discipline already applies to any other powerful instrument.

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).