Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, & Mario Fritz (2023), References, Textbook of AI

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, & Mario Fritz (2023)

ACM Workshop on Artificial Intelligence and Security.

URL: https://arxiv.org/abs/2302.12173

Abstract. The first systematic treatment of indirect prompt injection, adversarial instructions hidden in third-party content (web pages, documents, emails) that an LLM-integrated application later retrieves and processes. Demonstrates working attacks against real systems including Bing Chat and a ChatGPT plugin: a malicious web page can exfiltrate user data, manipulate downstream tool calls, or hijack the conversation when the LLM treats retrieved content as instructions. The paper established indirect prompt injection as the central security failure mode of agentic LLM systems and shaped subsequent research and industry mitigations.

Tags: safety adversarial prompt-injection language-models

Cited in:

Chapter 16: Ethics & Safety

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection