References

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Mark Russinovich, Ahmed Salem, & Ronen Eldan (2024)

arXiv:2404.01833.

URL: https://arxiv.org/abs/2404.01833

Abstract. Introduces the Crescendo multi-turn jailbreak. Rather than attempting to elicit a harmful response in a single turn (which is typically refused), Crescendo opens with an entirely benign question on the surrounding topic, then incrementally escalates over a few turns so that each individual turn appears to be a small extension of the previous. The model, anchored to its prior cooperative responses, eventually produces content it would have refused in turn one. Crescendo is highly effective across frontier closed and open models and demonstrates that single-turn safety training does not transfer to multi-turn dialogue.

Tags: safety adversarial jailbreak

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).