GNoME, Glossary, Textbook of AI

GNoME (Graph Networks for Materials Exploration), published by Merchant, Batzner, Schoenholz et al. (Google DeepMind, Nature 2023), is a graph neural network ensemble that, coupled with high-throughput density functional theory (DFT) verification, discovered ~380 000 new thermodynamically stable inorganic crystal structures, expanding the known set roughly tenfold and rivalling decades of cumulative experimental crystallography in a single project.

The core problem in materials discovery is the convex hull of formation energy: a candidate composition is stable if no linear combination of other known phases of the same chemistry has lower energy. Computing formation energies with DFT is accurate but expensive (~hours of CPU per structure). The GNoME pipeline replaces most of this with a learned surrogate, then verifies survivors with DFT.

The surrogate is an ensemble of graph neural networks in which each crystal is represented as a graph: nodes are atoms (with element identity as features), edges connect atoms within a cutoff radius, and edge features encode interatomic distances and lattice vectors. The networks predict total energy per atom $E_\theta(\mathcal{G})$ trained against a corpus of DFT-relaxed structures. Two architectural families are used: a NequIP-style E(3)-equivariant network for high accuracy, and a scalable graph network (GNS-style) for high throughput. Final predictions ensemble across architectures and seeds, with the ensemble disagreement serving as an uncertainty estimate.

The active-learning loop is GNoME's most important methodological contribution. At each round: (i) candidate compositions are generated, both by random Structural sampling (substituting elements into known crystal templates) and by Compositional sampling (combining elements with no prior structural template); (ii) candidates are screened by the GNN ensemble and ranked by predicted distance to the convex hull; (iii) the top candidates are relaxed and verified by DFT; (iv) the verified energies are added to the training set and the ensemble retrained. After six rounds the hit rate climbed from 1% (single-shot screening) to **80% structural and ~33% compositional**, indicating the GNNs had absorbed enough chemistry to predict stability with high reliability.

Of the 2.2 million candidates the team verified, ~381 000 sit on the convex hull and are predicted stable, with hundreds of thousands of additional metastable structures within 50 meV/atom. Predicted classes of new materials include Li-ion conductors (relevant to solid-state batteries), layered oxides, chalcogenides and intermetallics. A subset has since been independently synthesised by collaborator labs at LBNL's A-Lab autonomous synthesis facility, providing experimental validation.

Critics have noted that DFT predictions of stability do not guarantee synthesisability (kinetic accessibility, defect tolerance, bulk vs film growth) and that some of the "new" structures are minor perturbations of known phases. The GNoME team's own assessment acknowledges these limits but argues that even with conservative discounting the discovery rate is unprecedented and the released dataset (open under CC-BY-4.0) significantly enlarges the training corpus available to all subsequent materials-ML work.

Related terms: Graph Neural Network, MACE

Discussed in:

Chapter 17: Applications, Materials Discovery

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).