References

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI (2025)

arXiv:2501.12948.

URL: https://arxiv.org/abs/2501.12948

Abstract. The DeepSeek-R1 paper, published in January 2025. Demonstrates that pure large-scale reinforcement learning, without supervised fine-tuning on reasoning traces, can produce a reasoning model matching or exceeding OpenAI's o1 on many benchmarks. Introduces GRPO (Group Relative Policy Optimisation), an efficient RL algorithm that drops PPO's value baseline. The release of DeepSeek-R1 under an open licence with full training details prompted the "DeepSeek moment", a sharp public reassessment of US AI export-control policy, frontier-lab valuations, and the US-China AI competitive landscape.

Tags: language-models reasoning history

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).