References

OpenAI o1 System Card

OpenAI (2024)

OpenAI.

URL: https://openai.com/index/openai-o1-system-card/

Abstract. OpenAI's system card for o1, the first widely deployed reasoning model trained to think for many thousands of tokens before answering. o1 was trained with large-scale reinforcement learning on chain-of-thought reasoning, with reward signals from process-reward models and verifiable answers in mathematics and code. The system card reports substantial gains on hard reasoning benchmarks (AIME, GPQA, Codeforces) and documents the new safety considerations introduced by visible chain-of-thought, including the model's ability to reason about its own safety guidelines. o1 launched the inference-time-compute scaling paradigm.

Tags: language-models reasoning history

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).