A Responsible Scaling Policy (RSP) is a public commitment by an AI developer to tie further scaling of model capabilities to demonstrated safety progress, with pre-specified capability thresholds that trigger pre-specified safety commitments. The framework was introduced by Anthropic in September 2023 and has since been emulated, with variations, by OpenAI (Preparedness Framework, December 2023) and Google DeepMind (Frontier Safety Framework, May 2024).
Structure
An RSP typically specifies:
AI Safety Levels (ASL), discrete capability tiers (Anthropic) or risk levels (OpenAI's Low/Medium/High/Critical), with concrete behavioural definitions.
Evaluation commitments, what tests must be run before each new model is trained or deployed, and how often during training.
Containment / deployment commitments, what security and deployment measures apply at each level (e.g. pre-deployment red-teaming, model weights protections, internal usage policies).
Stop conditions, capability findings that trigger a pause in scaling until safety progress catches up.
Anthropic's RSP, for example, commits that an ASL-3 model (one that materially uplifts a non-state actor's CBRN capability) cannot be deployed without specified mitigations and that an ASL-4 model would require currently-non-existent safety guarantees before training.
Theory of change
Three goals:
Internal discipline, force the developer to think through safety commitments before they have an incentive to soften them.
Industry coordination, provide a template other labs can match without unilateral disadvantage.
Regulatory scaffolding, give legislators a working example of capability-threshold-based regulation, easing the path to enforceable rules (cf. EU AI Act's general-purpose-AI thresholds at 10²⁵ FLOPs).
Criticisms
Self-marking, labs evaluate themselves, with weak external verification.
Vagueness, early thresholds were qualitative and arguably under-specified.
Voluntariness, a lab can revise its own RSP under commercial pressure.
Race dynamics, RSPs do not address competitive pressure to scale despite uncertainty.
Status
As of 2026, RSPs (under various names) are standard practice among frontier labs. The Seoul Declaration commitments (May 2024) required signatories to publish such policies. Independent national AISIs (UK, US, EU, Japan, Singapore) increasingly conduct the underlying evaluations, partially addressing the self-marking concern.
References
Anthropic (2023). Responsible Scaling Policy v1. (Updated v2 in 2024.)
OpenAI (2023). Preparedness Framework.
Google DeepMind (2024). Frontier Safety Framework.
Related terms: AI Safety Levels (ASL), Evaluations / Capability Evaluations, Frontier AI Safety Commitments, Bletchley AI Safety Summit, Anthropic, OpenAI
Discussed in:
- Chapter 14: Generative Models, Responsible scaling