Glossary

AI Safety Levels (ASL)

AI Safety Levels (ASL) are Anthropic's tiered classification of frontier-model capability and the corresponding safety and security regime that applies at each tier. The scheme is patterned on the US Biosafety Levels (BSL-1 to BSL-4) used in laboratories handling pathogens of increasing danger. ASL is the operational core of Anthropic's Responsible Scaling Policy and was introduced in 2023.

The levels

  • ASL-1, Models that pose no meaningful catastrophic risk because they lack the relevant capabilities. A chess engine or a small narrow-purpose model. Few special precautions.

  • ASL-2, Models showing early signs of dangerous capability but not at thresholds where catastrophic misuse is plausible. Most current frontier LLMs (Claude 3, GPT-4) are categorised here. Standard safety stack: RLHF, refusal training, usage policies, abuse monitoring.

  • ASL-3, Models that provide substantial uplift to non-state actors seeking to cause mass casualties (CBRN attacks at thresholds defined in the policy), or that show autonomous capability approaching dangerous levels. Triggers stronger weight-protection security, hardened deployment, mandatory pre-deployment external evaluation, and constrained release.

  • ASL-4, Models capable of substantially uplifting state-actor CBRN programmes or showing autonomous AI R&D capability, a threshold close to "transformative AI". Anthropic states ASL-4 deployment requires currently-unavailable safety techniques (in particular, scalable interpretability sufficient to verify model intentions). Training must be paused absent these techniques.

  • ASL-5+, Notional higher levels, currently undefined in detail; reserved for systems whose deployment would require fundamental advances in alignment science.

Capability thresholds

Anthropic's RSP defines the ASL boundaries by operational evaluations: specific pass/fail tests (e.g. uplift on a defined biothreat task above a specified percentile) rather than abstract descriptions. This is intended to give external auditors something concrete to verify.

Status and adoption

OpenAI's Preparedness Framework uses the analogous tiers Low/Medium/High/Critical with similar mappings to deployment commitments. Google DeepMind's Frontier Safety Framework uses Critical Capability Levels (CCLs) for autonomy, biology, cyber, and ML R&D. The schemes are not exactly equivalent but converge in spirit.

As of 2026, no model has been publicly classified at ASL-3 by Anthropic, although the company has stated that some current Claude models are approaching the boundary on certain dimensions.

References

  • Anthropic (2023, 2024). Responsible Scaling Policy v1, v2.

  • OpenAI (2023). Preparedness Framework.

  • Google DeepMind (2024). Frontier Safety Framework.

Related terms: Responsible Scaling Policy (RSP), Evaluations / Capability Evaluations, Frontier AI Safety Commitments, Anthropic

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).