OpenAI (2024)
OpenAI.
URL: https://openai.com/index/openai-o1-system-card/
Abstract. OpenAI's system card for o1, the first widely deployed reasoning model trained to think for many thousands of tokens before answering. o1 was trained with large-scale reinforcement learning on chain-of-thought reasoning, with reward signals from process-reward models and verifiable answers in mathematics and code. The system card reports substantial gains on hard reasoning benchmarks (AIME, GPQA, Codeforces) and documents the new safety considerations introduced by visible chain-of-thought, including the model's ability to reason about its own safety guidelines. o1 launched the inference-time-compute scaling paradigm.
Tags: language-models reasoning history