Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, & Mario Fritz (2023)
ACM Workshop on Artificial Intelligence and Security.
URL: https://arxiv.org/abs/2302.12173
Abstract. The first systematic treatment of indirect prompt injection, adversarial instructions hidden in third-party content (web pages, documents, emails) that an LLM-integrated application later retrieves and processes. Demonstrates working attacks against real systems including Bing Chat and a ChatGPT plugin: a malicious web page can exfiltrate user data, manipulate downstream tool calls, or hijack the conversation when the LLM treats retrieved content as instructions. The paper established indirect prompt injection as the central security failure mode of agentic LLM systems and shaped subsequent research and industry mitigations.
Tags: safety adversarial prompt-injection language-models
Cited in: