The DeepSeek-R1 release on 20 January 2025 was the publication, with full training details and open weights, of a reasoning model from the Chinese AI startup DeepSeek that matched or exceeded OpenAI's o1 on many benchmarks. The release prompted what was widely called the "DeepSeek moment":
Stock-market reaction: Nvidia's market capitalisation fell by approximately $600 billion in a single day (27 January 2025) as investors reassessed the compute requirements for frontier AI; Public reassessment of US AI export-control policy, particularly the restrictions on advanced GPU sales to China; Frontier-lab valuations were briefly questioned as it became clear that frontier capability could be reproduced more cheaply than US labs had been spending; Open-source momentum received a substantial boost, demonstrating that rigorous open-weights releases were a viable competitive strategy at the frontier.
DeepSeek-R1 was trained using large-scale pure reinforcement learning on verifiable mathematical and coding tasks, without an extensive supervised fine-tuning stage on reasoning traces. The training recipe demonstrated that reasoning capability could be elicited directly through RL, simplifying and substantially cheapening the training pipeline.
The release also raised governance questions: the model's full open release means that any future safety modifications cannot be retracted, and the model can be deployed on commodity hardware anywhere in the world. The compounding effects on the international AI safety policy conversation continue to be debated as of 2026.
Related terms: liang-wenfeng, o1 / Reasoning Models, Reinforcement Learning
Discussed in:
- Chapter 1: What Is AI?, A Brief History of AI