Stop Indirect Prompt Injection with Tool Graphs

Defenses

Published: Fri, Aug 22, 2025 • By Lydia Stratus

Stop Indirect Prompt Injection with Tool Graphs

New research shows an architectural fix that blocks a sneaky attack where external tool outputs covertly hijack LLM agents. IPIGuard plans tool use as a dependency graph and separates planning from data fetches. That reduces unintended tool calls, tightening control over GPUs, vectors, and secrets so production agents handle untrusted inputs safer.

A practical new pattern, IPIGuard, gives ops teams a path out of a gnarly problem: indirect prompt injection (IPI). IPI is when a tool your agent calls (a scraper, a knowledge store, a connector) returns data that sneaks in instructions and makes the agent call the wrong tools or leak secrets. That matters because model endpoints, GPU pipelines, vector DBs, and secret stores are all in the blast radius.

Core idea in plain ops terms: plan first, fetch later. Build a Tool Dependency Graph (TDG) that maps which tools a task legitimately needs, then force actual data access to follow that planned graph. Concept diagram-in-words: Agent Planner -> TDG (approved nodes) -> Controlled Fetch Nodes -> Execution Sandbox -> Result.

Quick risk checklist for SREs and security teams:

Model endpoints: Are endpoints allowed to trigger arbitrary tools? Check invocation policies.
GPUs: Is inference isolated from untrusted tool code or drivers?
Vectors: Can external text overwrite or alter indexed vectors without review?
Secrets: Are credentials or tokens exposed to tool outputs?
Data paths: Do fetches go straight into the agent context or pass a gate?

Stepwise mitigations (run-book friendly):

Implement a TDG policy layer: require explicit tool dependencies before any fetch.
Decouple planning from execution: planner produces the graph; an executor follows it without re-planning from fetched text.
Whitelist and sandbox tools; enforce RBAC and ephemeral credentials for each tool call.
Isolate vector DB writes; require signed updates and validation hooks.
Lock down GPU tenancy and audit driver calls during tool-driven jobs.
Enable verbose audit logging, synthetic canaries, and quick kill-switches for agent runs.

If you have 30 minutes: add a policy gate that blocks new tool invocations unless the TDG approves them. If you have a week: build the planner/executor split and add canary inputs to detect IPI attempts. Not a silver bullet, but IPIGuard gives you an architectural lever—fewer surprise tool calls, fewer secrets spilled, and much less late-night debugging.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks. However, when interacting with untrusted data sources (e.g., fetching information from public websites), tool responses may contain injected instructions that covertly influence agent behaviors and lead to malicious outcomes, a threat referred to as Indirect Prompt Injection (IPI). Existing defenses typically rely on advanced prompting strategies or auxiliary detection models. While these methods have demonstrated some effectiveness, they fundamentally rely on assumptions about the model's inherent security, which lacks structural constraints on agent behaviors. As a result, agents still retain unrestricted access to tool invocations, leaving them vulnerable to stronger attack vectors that can bypass the security guardrails of the model. To prevent malicious tool invocations at the source, we propose a novel defensive task execution paradigm, called IPIGuard, which models the agents' task execution process as a traversal over a planned Tool Dependency Graph (TDG). By explicitly decoupling action planning from interaction with external data, IPIGuard significantly reduces unintended tool invocations triggered by injected instructions, thereby enhancing robustness against IPI attacks. Experiments on the AgentDojo benchmark show that IPIGuard achieves a superior balance between effectiveness and robustness, paving the way for the development of safer agentic systems in dynamic environments.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies Indirect Prompt Injection (IPI), a threat where tool outputs fetched from untrusted sources covertly inject instructions that alter large language model (LLM) agent behaviour and produce malicious or unintended outcomes. The problem matters because modern agents routinely call external tools and lack structural constraints on when and how tools are invoked, so prompt-only or detector-based defences can be bypassed and agents retain unrestricted access to tool invocations.

Approach

The authors propose IPIGuard, a defensive task-execution paradigm that models an agent's workflow as a traversal over a planned Tool Dependency Graph (TDG). IPIGuard explicitly decouples action planning from interactions that fetch external data, constraining tool invocation to the planned graph and preventing malicious tool calls at the source. Experiments are reported on the AgentDojo benchmark. Specifics such as the LLM families, tool implementations, training or evaluation metrics, and runtime overhead are not reported.

Key Findings

IPIGuard substantially reduces unintended tool invocations that arise from injected instructions.
By separating planning from data access, IPIGuard improves robustness against IPI attacks compared with prompt-based or auxiliary-detection defences.
On the AgentDojo benchmark, IPIGuard achieves a superior balance between task effectiveness and security robustness.

Limitations

Many evaluation details are not reported, including quantitative attack success rates, performance overhead, generalisability across agent architectures, and implementation complexity. Threats to validity and deployment trade-offs are not reported.

Why It Matters

Modelling tool usage as a Tool Dependency Graph provides a concrete architectural defence that lowers the attack surface for IPI, making injected instructions less effective. This approach offers a practical route to harden agent pipelines used in critical or data-sensitive environments and helps reduce risks such as data leakage, manipulation of tool outputs, or unintended autonomous actions.

Attribution Original paper on arXiv