Responsible Scaling Policy (RSP), Glossary, Textbook of AI

A Responsible Scaling Policy (RSP) is a public commitment by an AI developer to tie further scaling of model capabilities to demonstrated safety progress, with pre-specified capability thresholds that trigger pre-specified safety commitments. The framework was introduced by Anthropic in September 2023 and has since been emulated, with variations, by OpenAI (Preparedness Framework, December 2023) and Google DeepMind (Frontier Safety Framework, May 2024).

Structure

An RSP typically specifies:

AI Safety Levels (ASL), discrete capability tiers (Anthropic) or risk levels (OpenAI's Low/Medium/High/Critical), with concrete behavioural definitions.
Evaluation commitments, what tests must be run before each new model is trained or deployed, and how often during training.
Containment / deployment commitments, what security and deployment measures apply at each level (e.g. pre-deployment red-teaming, model weights protections, internal usage policies).
Stop conditions, capability findings that trigger a pause in scaling until safety progress catches up.

Anthropic's RSP, for example, commits that an ASL-3 model (one that materially uplifts a non-state actor's CBRN capability) cannot be deployed without specified mitigations and that an ASL-4 model would require currently-non-existent safety guarantees before training.

Theory of change

Three goals:

Internal discipline, force the developer to think through safety commitments before they have an incentive to soften them.
Industry coordination, provide a template other labs can match without unilateral disadvantage.
Regulatory scaffolding, give legislators a working example of capability-threshold-based regulation, easing the path to enforceable rules (cf. EU AI Act's general-purpose-AI thresholds at 10²⁵ FLOPs).

Criticisms

Self-marking, labs evaluate themselves, with weak external verification.
Vagueness, early thresholds were qualitative and arguably under-specified.
Voluntariness, a lab can revise its own RSP under commercial pressure.
Race dynamics, RSPs do not address competitive pressure to scale despite uncertainty.

Status

As of 2026, RSPs (under various names) are standard practice among frontier labs. The Seoul Declaration commitments (May 2024) required signatories to publish such policies. Independent national AISIs (UK, US, EU, Japan, Singapore) increasingly conduct the underlying evaluations, partially addressing the self-marking concern.

References

Anthropic (2023). Responsible Scaling Policy v1. (Updated v2 in 2024.)
OpenAI (2023). Preparedness Framework.
Google DeepMind (2024). Frontier Safety Framework.

Discussed in:

Chapter 14: Generative Models, Responsible scaling

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).