ShortSpan.ai logo

Study maps agentic AI attack surface and risks

Agents
Published: Wed, Mar 25, 2026 • By Theo Solander
Study maps agentic AI attack surface and risks
A new systematisation maps the attack surface of agentic AI that combines Large Language Models with tools, retrieval and autonomy. It catalogues prompt injection, RAG poisoning, tool exploits and cross-agent manipulation, proposes attacker-aware metrics and a defence-in-depth playbook, and offers phased deployment checklists for design, monitoring and incident response.

Agentic AI has moved from demos to deployment, with Large Language Model (LLM) agents planning tasks, calling tools, pulling from retrieval-augmented generation (RAG) stores and looping until goals are met. Power grows with each integration point. So does the attack surface. If this all feels familiar, it should. Each time we connect new capabilities to untrusted inputs, we relearn the old lesson about trust boundaries.

This systematisation of knowledge surveys research and industry reports from 2023 to 2025 and draws a clear map of where things break. The authors lay out a reference architecture, spell out the Trusted Computing Base, and sketch a causal threat graph that shows how a prompt can become an unsafe action after a few hops through tools and data sources. They review more than twenty studies and standards to ground the taxonomy and guidance.

What the study maps

The taxonomy captures the qualitatively new attack vectors that appear once an LLM is wired to the world. Indirect prompt injection arrives via retrieved content. RAG index poisoning nudges the agent toward attacker-chosen facts. Tool and schema exploits turn innocuous function calls into code execution risk. Cross-agent manipulation and supply-chain backdoors widen the blast radius when agents coordinate or rely on third-party components. The paper also clarifies adversary classes, from external attackers to insiders and compromised services, and lists the assets at stake: sensitive data, credentials, infrastructure and the integrity of agent behaviour.

Five representative paths make the risks concrete: direct prompts that steer tools into misuse; malicious content that flows through the LLM into unsafe tool calls; pivots across tools once any foothold is gained; index poisoning that changes what the agent says and does; and multi-agent hops that propagate compromise across a workflow.

Measuring and defending

To move from anecdotes to engineering, the study proposes attacker-aware metrics and a benchmarking harness. Unsafe Action Rate and Policy Adherence Rate track how often agents go off-script. Privilege Escalation Distance estimates how close a chain of actions gets to sensitive operations. Retrieval Risk Score highlights dangerous fetches. Time-to-Contain and Out-of-Role Action Rate reflect operational containment and drift. Cost-Exploit Susceptibility and Patch Half-Life bring a lifecycle lens. The authors describe logging and statistical procedures for computing these in continuous integration.

On defences, the message is unglamorous but practical: no single control is enough. Combine content sanitisation, provenance checks and retrieval gating with typed tool schemas, sandboxed execution and least-privilege credentials. Prefer plan-then-act workflows and keep human-in-the-loop gates for high-impact actions. Enforce quotas and kill-switches. Maintain immutable audit logs and keep red-teaming continuous rather than episodic. The paper packages these into phased checklists spanning design-time hardening, runtime monitoring and incident response.

There are caveats. The focus is on agentic pipelines, not static model-only settings. Evidence is strongest for tool- and retrieval-layer attacks; evaluations of lifecycle and multi-agent defences are less mature. Open gaps remain: durable persistence and concealment, subtle retrieval perturbations that evade filters, standardised multi-agent testbeds and stronger supply-chain safeguards.

If you build or buy agents, the practical takeaway is clear enough. Treat them as hybrid systems that mix classic software risk with instruction-following quirks. Model the trust boundaries, adopt the metrics in CI, and layer controls where tools, retrieval and autonomy meet. History’s rhyme here is reassuring: we have done this before. We tightened interfaces, reduced privileges and kept better logs. The same habits will serve again, so long as we measure what matters and watch the loops where an innocuous prompt can turn into action.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Authors: Ali Dehghantanha and Sajad Homayoun
Recent AI systems combine large language models with tools, external knowledge via retrieval-augmented generation (RAG), and even autonomous multi-agent decision loops. This agentic AI paradigm greatly expands capabilities - but also vastly enlarges the attack surface. In this systematization, we map out the trust boundaries and security risks of agentic LLM-based systems. We develop a comprehensive taxonomy of attacks spanning prompt-level injections, knowledge-base poisoning, tool/plug-in exploits, and multi-agent emergent threats. Through a detailed literature review, we synthesize evidence from 2023-2025, including more than 20 peer-reviewed and archival studies, industry reports, and standards. We find that agentic systems introduce new vectors for indirect prompt injection, code execution exploits, RAG index poisoning, and cross-agent manipulation that go beyond traditional AI threats. We define attacker models and threat scenarios, and propose metrics (e.g., Unsafe Action Rate, Privilege Escalation Distance) to evaluate security posture. Our survey examines defenses such as input sanitization, retrieval filters, sandboxes, access control, and "AI guardrails," assessing their effectiveness and pointing out the areas where protection is still lacking. To assist practitioners, we outline defensive controls and provide a phased security checklist for deploying agentic AI (covering design-time hardening, runtime monitoring, and incident response). Finally, we outline open research challenges in secure autonomous AI (robust tool APIs, verifiable agent behavior, supply-chain safeguards) and discuss ethical and responsible disclosure practices. We systematize recent findings to help researchers and engineers understand and mitigate security risks in agentic AI.

🔍 ShortSpan Analysis of the Paper

Problem

This paper systematises the expanded attack surface created when large language models are combined with tools, retrieval-augmented generation and autonomous multi‑step agents. Agentic systems can plan, call APIs, run code, maintain persistent state and coordinate with other agents, and those capabilities introduce new integrity, confidentiality and availability risks beyond traditional model‑only failures. The work maps trust boundaries, defines a Trusted Computing Base, and identifies how retrieved content, tool integrations and multi‑agent interactions enable end‑to‑end compromises.

Approach

The authors perform a literature‑driven systematisation covering 2023–2025, synthesising peer‑reviewed papers, preprints, industry reports and standards. They present a reference architecture with explicit trust boundaries, construct a causal threat graph, define attacker classes and goals, and develop a taxonomy of attack vectors and five representative attack paths. They propose attacker‑aware metrics and a reproducible benchmarking harness, grade evidence levels, and assemble a defence‑in‑depth playbook plus phased deployment checklists for design‑time hardening, runtime monitoring and incident response.

Key Findings

  • Agentic systems expose qualitatively new vectors: indirect prompt injection via retrieved content, targeted RAG index poisoning, code‑execution and tool/schema exploits, cross‑agent manipulation and supply‑chain backdoors, not captured by model‑only surveys.
  • Adversary classes include external attackers, malicious content providers, supply‑chain adversaries, insiders and compromised services; assets at risk include sensitive data, credentials, infrastructure and agent behaviour integrity.
  • Representative attack paths (P1–P5) map common exploit chains: direct prompt→tool misuse; indirect content→LLM→tool; cross‑tool pivots; index poisoning→response; and multi‑agent hops that propagate compromise.
  • Concrete attacker‑aware metrics are proposed for evaluation and CI gating: Unsafe Action Rate (UAR), Policy Adherence Rate (PAR), Privilege Escalation Distance (PED), Retrieval Risk Score (RRS), Time‑to‑Contain (TTC), Out‑of‑Role Action Rate (OORAR), Cost‑Exploit Susceptibility and Patch Half‑Life (PHL). The paper describes logging, harnessing and statistical procedures to compute them.
  • Defences are effective only in combination: content sanitisation, provenance and retrieval gating, typed tool schemas, sandboxed execution, least‑privilege credentials, plan‑then‑act workflows, human‑in‑the‑loop gates, quotas/kill‑switches, immutable audit logs and continuous red‑teaming are recommended.

Limitations

The work focuses on agentic pipelines and omits static, model‑only settings; it is a systematisation rather than an exhaustive empirical evaluation. Evidence grades show strong support for attack techniques at tool and retrieval layers but weaker, less mature empirical evaluation for many lifecycle and multi‑agent defences. Open gaps include durable persistence and concealment, subtle retrieval perturbations, standardised multi‑agent testbeds and robust supply‑chain safeguards.

Why It Matters

As organisations deploy LLM agents into production with access to internal data and execution privileges, these systems create hybrid attack surfaces combining classic software vulnerabilities and instruction‑following failure modes. The taxonomy, threat graph and metrics provide practical inputs for threat modelling, CI‑integrated testing and risk‑based deployment controls. Defence‑in‑depth, continuous red‑teaming and governance measures (policy‑as‑code, SBOMs, incident playbooks and responsible disclosure) are necessary to reduce unsafe actions and contain breaches in real deployments.


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.