Abstract. The DeepSeek-R1 paper, published in January 2025. Demonstrates that pure large-scale reinforcement learning, without supervised fine-tuning on reasoning traces, can produce a reasoning model matching or exceeding OpenAI's o1 on many benchmarks. Introduces GRPO (Group Relative Policy Optimisation), an efficient RL algorithm that drops PPO's value baseline. The release of DeepSeek-R1 under an open licence with full training details prompted the "DeepSeek moment", a sharp public reassessment of US AI export-control policy, frontier-lab valuations, and the US-China AI competitive landscape.