Adaptive tools amplify agent prompt-injection risk

Agents

Published: Wed, Feb 25, 2026 • By Lydia Stratus

Adaptive tools amplify agent prompt-injection risk

New research on agentic Large Language Models shows adaptive indirect prompt injection can steer tool-using agents past task-relevance checks. By choosing plausible tools and iterating attack prompts, success rates more than double and utility drops notably. Defences help but leave large gaps, especially on local open-source models.

Agent frameworks keep wiring Large Language Models, or LLMs, into external tools through protocols such as the Model Context Protocol. That is useful for getting real work done. It also broadens the blast radius when an attacker slips hostile instructions into the data those tools return. The latest work on adaptive indirect prompt injection shows how easily that gap between a tidy research demo and a 3am incident can close.

What the research shows

The AdapTools framework targets agentic LLMs that fetch data or call tools. It builds two advantages for the attacker. First, it iteratively refines prompts using agent reasoning traces, then distils those strategies into a reusable library. Second, it selects a tool that looks like the agent’s next sensible move, using a simple transition model and semantic similarity, so task-relevance checks are less likely to block it.

Across evaluated models, this approach roughly doubles attack success. The authors report a 2.13× increase in success rate and a 1.78× degradation in utility. On commercial systems, attack success rises into the teens and twenties in percentage terms depending on the model. Open-source local models fare worse in places, with some groups above fifty per cent. The stealthy tool chooser adds several percentage points on top of the base attack. Runtime defences reduce success for many baselines, but the adaptive method remains effective, cutting less deeply into its rates. Iteration matters: one round delivers modest gains, while multiple rounds push success much higher before levelling off; five iterations are used as a practical balance.

The team also releases a benchmark of thousands of benign agent trajectories and hundreds of high-authority tools to test against, and they consider both grey-box attackers who can influence a server-side controller and black-box third parties who only control content.

Where this bites in real environments

This is not an abstract language game. It maps neatly to production stacks where an agent reads from endpoints and then fires tools: browse a vendor page, parse an API response, edit a document, file a ticket, run a query, send an email. The attack nests in the returned content and nudges the agent to pick a powerful tool at exactly the moment it seems reasonable. Your comforting system prompt that says use only relevant tools does little when the attacker makes their tool choice look relevant.

At the orchestration layer, think about the broker that maps model intent to tools. This paper’s trick of modelling the next-likely tool for stealth can be inverted defensively. Learn expected tool transitions from your own benign traffic and flag improbable jumps, especially into high-authority tools like payment, code execution or mass data export. You already do something similar for API abuse; do it here.

For endpoints and data pipelines, treat every external response as untrusted. Sanity-check and segment what you inject into the model context. Do not put raw HTML, unvetted JSON, or third-party text straight into the same context window as your privileged instructions. If you must, annotate and isolate it so the agent cannot easily confuse user content with policy.

On model serving, log every tool call with arguments and outcome, and bind those calls to a service that enforces policy. Give each tool its own identity. Keep credentials short-lived and tightly scoped. The dataset used in the paper includes hundreds of high-authority tools; assume attackers will aim there first. Do not let the model see raw secrets; the broker should handle tokens and sign requests on the model’s behalf.

On GPU clusters and inference nodes, the model is not the crown jewel; the tools are. Limit egress from model pods to approved tool endpoints, and block direct Internet access where that is operationally feasible. If you use Model Context Protocol servers or custom connectors, put them on their own auth boundary and apply the same controls you would to any integration tier.

Finally, be cautious with reasoning traces. The attack method learns from them to get better. If you store traces for observability, treat them as sensitive and avoid spraying them into third-party analytics.

There are limits. The results depend on the chosen models, the new benchmark and the attacker model that crafts prompts. Building an adaptive strategy library is costlier than past one-shot attacks. Still, the direction of travel is clear: adaptive, tool-aware injection survives current detectors and makes agents misbehave in ways that look plausible. If you run tool-using agents in production, move your controls to the tool boundary, watch the sequences, and stop assuming a polite system prompt will save you. It will not.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Authors: Che Wang, Jiaming Zhang, Ziqi Zhang, Zijie Wang, Yinghui Wang, Jianbo Gao, Tao Wei, Zhong Chen, and Wei Yang Bryan Lim

The integration of external data services (e.g., Model Context Protocol, MCP) has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection (IPI) attacks. Existing attack methods are limited by their reliance on static patterns and evaluation on simple language models, failing to address the fast-evolving nature of modern AI agents. We introduce AdapTools, a novel adaptive IPI attack framework that selects stealthier attack tools and generates adaptive attack prompts to create a rigorous security evaluation environment. Our approach comprises two key components: (1) Adaptive Attack Strategy Construction, which develops transferable adversarial strategies for prompt optimization, and (2) Attack Enhancement, which identifies stealthy tools capable of circumventing task-relevance defenses. Comprehensive experimental evaluation shows that AdapTools achieves a 2.13 times improvement in attack success rate while degrading system utility by a factor of 1.78. Notably, the framework maintains its effectiveness even against state-of-the-art defense mechanisms. Our method advances the understanding of IPI attacks and provides a useful reference for future research.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies indirect prompt injection (IPI) attacks that exploit external data services and tool integration in agentic large language models. As agents increasingly call out to third-party services via protocols such as Model Context Protocol, attackers can embed malicious instructions in returned content. Modern reasoning-capable LLMs perform multi-step verification and use task-relevance checks, so prior static, template-based IPI attacks are often ineffective. The work examines how to construct attacks that are robust to internal reasoning, adaptable to evolving defences and stealthy by choosing semantically plausible tools, and why these attacks matter for security assessment of real-world agents.

Approach

The authors introduce AdapTools, an adaptive IPI framework composed of two main modules. The Adaptive Attack Module (Instruction Refinement) automatically generates and iteratively refines attack strategies by synthesising adversarial prompts with an attacker LLM and learning from agent reasoning traces to improve transferability. Strategy Distillation clusters and prunes fine-grained strategies into a compact, generalisable library using semantic embeddings and an ASR-based pruning criterion. The Attack Enhancement Module (Adaptive Tool Selection) increases stealth by selecting attack tools that are temporally and semantically aligned with the agent's expected next tool, modelled via a first-order Markov transition matrix and semantic similarity of tool descriptions. The paper also introduces IPI-3K, a benchmark of 3,691 benign agent trajectories and 277 high-authority attack tools, to evaluate attacks across open-source and commercial reasoning LLMs. Threat models include grey-box MCP server controllers and black-box third-party adversaries.

Key Findings

AdapTools raises attack success rates significantly, achieving on average a 2.13× improvement over the best baseline while degrading system utility by a factor of 1.78.
On commercial models, AdapTools yields concrete gains: GPT-4.1 ASR rises to 18.5% versus baseline performance below 8%; Gemini-2.5-Flash 25.9% versus 9.2% baseline; DeepSeek-R1 13.5% versus 6.7% baseline.
Open-source, locally deployed models are more vulnerable: average ASR reported up to 58.1% for some groups, with Mistral-8B reaching 49.0% ASR and LLaMA-3.1-8B 31.4%; Qwen-3-8B shows 44.5% ASR while better preserving benign utility.
Adaptive tool selection improves stealth: the mechanism increased ASR by 4.7% on GPT-4.1 and 7.9% on Qwen3-8B in experiments.
Defences reduce but do not eliminate risk: state-of-the-art detectors can halve ASR for many baseline attacks, but with AdapTools reductions are smaller (for example GPT-4.1 ASR fell from 26.1% to around 13.3% or 16.1% with defences; Qwen3-8B from 60.6% to 36.4% in one setting).
Strategy iteration matters: a single refinement iteration yielded roughly 35% ASR while multiple iterations raised ASR above 80% before convergence; five iterations were chosen as a cost–performance trade-off.

Limitations

Evaluations focus on a selected set of open-source and commercial LLMs and a new benchmark constructed by the authors; results depend on those models, dataset composition and the attack LLM used for synthesis. Some numerical comparisons vary across sections, reflecting different experimental subsets. Defence experiments use two detectors and may not represent the full range of possible mitigation strategies. Cost and scalability of offline library construction and reliance on particular attacker LLMs are additional practical constraints.

Why It Matters

AdapTools demonstrates that adaptive, tool-aware IPI attacks can bypass many static defences and meaningfully compromise agent behaviour and utility. The work highlights the enlarged attack surface introduced by tool integration and third-party data services and provides a practical framework and dataset for red-teaming and vulnerability assessment. The findings underline the need for stronger controls over tool access, improved task-relevance checks, runtime monitoring of tool calls and continued development of adaptive defences for agentic systems.

Links Original paper on arXiv

Adaptive tools amplify agent prompt-injection risk

What the research shows

Where this bites in real environments

📋 Original Paper Title and Abstract

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Why It Matters

Related Articles

Stop Indirect Prompt Injection with Tool Graphs

Study Exposes Prompt Injection Risks for LLM Agents

Researchers Expose Stealthy Implicit Tool Poisoning in MCP

Related Research on arXiv

Get the Monthly AI Security Digest