AI Safety Levels (ASL), Glossary, Textbook of AI

AI Safety Levels (ASL) are Anthropic's tiered classification of frontier-model capability and the corresponding safety and security regime that applies at each tier. The scheme is patterned on the US Biosafety Levels (BSL-1 to BSL-4) used in laboratories handling pathogens of increasing danger. ASL is the operational core of Anthropic's Responsible Scaling Policy and was introduced in 2023.

The levels

ASL-1, Models that pose no meaningful catastrophic risk because they lack the relevant capabilities. A chess engine or a small narrow-purpose model. Few special precautions.
ASL-2, Models showing early signs of dangerous capability but not at thresholds where catastrophic misuse is plausible. Most current frontier LLMs (Claude 3, GPT-4) are categorised here. Standard safety stack: RLHF, refusal training, usage policies, abuse monitoring.
ASL-3, Models that provide substantial uplift to non-state actors seeking to cause mass casualties (CBRN attacks at thresholds defined in the policy), or that show autonomous capability approaching dangerous levels. Triggers stronger weight-protection security, hardened deployment, mandatory pre-deployment external evaluation, and constrained release.
ASL-4, Models capable of substantially uplifting state-actor CBRN programmes or showing autonomous AI R&D capability, a threshold close to "transformative AI". Anthropic states ASL-4 deployment requires currently-unavailable safety techniques (in particular, scalable interpretability sufficient to verify model intentions). Training must be paused absent these techniques.
ASL-5+, Notional higher levels, currently undefined in detail; reserved for systems whose deployment would require fundamental advances in alignment science.

Capability thresholds

Anthropic's RSP defines the ASL boundaries by operational evaluations: specific pass/fail tests (e.g. uplift on a defined biothreat task above a specified percentile) rather than abstract descriptions. This is intended to give external auditors something concrete to verify.

Status and adoption

OpenAI's Preparedness Framework uses the analogous tiers Low/Medium/High/Critical with similar mappings to deployment commitments. Google DeepMind's Frontier Safety Framework uses Critical Capability Levels (CCLs) for autonomy, biology, cyber, and ML R&D. The schemes are not exactly equivalent but converge in spirit.

As of 2026, no model has been publicly classified at ASL-3 by Anthropic, although the company has stated that some current Claude models are approaching the boundary on certain dimensions.

References

Anthropic (2023, 2024). Responsible Scaling Policy v1, v2.
OpenAI (2023). Preparedness Framework.
Google DeepMind (2024). Frontier Safety Framework.

Discussed in:

Chapter 14: Generative Models, Responsible scaling

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).