Glossary

Greshake's Indirect Prompt Injection

Indirect prompt injection is the variant of prompt injection in which the adversarial instructions are not typed by the user but arrive in retrieved or third-party content the LLM ingests during its task. The seminal description is Kai Greshake et al. (2023), Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, which demonstrated working attacks against Bing Chat, Microsoft Copilot and several open-source agents.

The threat model

A user trusts an LLM agent. The agent reads external content, search results, PDFs, calendar invites, GitHub README files, customer emails , and treats it as data. An attacker who controls any of that content can plant instructions that the model will execute on the user's behalf. The user typed nothing malicious; the attacker never touched the model directly; the breach is mediated entirely through the channel the agent uses to perceive the world.

Demonstrated attack vectors

Greshake and follow-on work have shown:

  • Web search, poisoned pages instruct Bing Chat to ask the user for their password.

  • Email summarisation, an attacker emails the victim; Gmail's AI summary follows instructions in the body to forward the inbox to the attacker.

  • Document parsing, a CV uploaded to an HR-screening agent instructs the model to recommend the candidate.

  • Image alt-text and EXIF, invisible text in an image hijacks a multimodal model.

  • Code review agents, a pull request includes adversarial comments that instruct the agent to merge.

Defences

The same defences as for direct prompt injection (delimiting, classifiers, capability gating, human-in-the-loop) but with an additional category: provenance tracking, ensuring that the system can audit which content was the source of any tool call. Anthropic's Model Context Protocol (MCP) and similar agent frameworks now standardise these audit trails.

Status

As of 2026, indirect prompt injection is the dominant adversarial concern for agentic AI deployments. Researchers continue to publish novel vectors, through PDFs with invisible text, watermarked images, and even ultrasonic audio for voice assistants. No defence is bulletproof; the defensive posture is one of layered mitigations and monitored deployment.

References

  • Greshake, Abdelnabi, Mishra, Endres, Holz, Fritz (2023). arXiv:2302.12173.

  • OWASP LLM Top 10 (2024).

  • Anthropic (2024). Model Context Protocol specification.

Related terms: Prompt Injection, Jailbreak, Retrieval-Augmented Generation, Tool Use

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).