New to ShortSpan? We distil the AI-security research that matters into practitioner takeaways — edited by Ben Williams (NCC Group). Get the weekly email
// Analysis

Survey maps LLM vulnerabilities across the enterprise stack

Enterprise
Survey maps LLM vulnerabilities across the enterprise stack

A new survey reframes AI security around the full Large Language Model (LLM) lifecycle, not just model weights. It details how retrieval-augmented generation (RAG), memory and agents turn untrusted content into executable instruction, why attacks chain across layers, and why point defences rarely compose. Practical focus: provenance, authority and containment.

Large Language Model (LLM) deployments are no longer just chat windows. They sit in retrieval pipelines, enterprise assistants, coding tools and ticketing flows, with permissions to read private data and call external tools. This survey argues the risk lives in that stack, not just in the weights. When untrusted inputs become instructions and the system holds delegated authority, small errors turn into incidents.

The authors organise the attack surface into eight stages: data collection, pretraining, post‑training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool or agent execution, and deployment or maintenance. The same technique looks different by stage. Poisoning a pretraining set is one thing; poisoning a retrieval corpus so the model later “obeys” embedded instructions is another. Trust boundaries fail in different ways.

RAG turns content into instructions

Retrieval‑augmented generation (RAG) and stateful memory convert document security into instruction security. Adversaries seed a knowledge base with indirect prompt injections that hitch a ride through retrieval. The model follows those embedded directives, extracting secrets, hijacking goals, or calling tools. Embedding manipulation can bias what gets retrieved in the first place, quietly steering answers and actions before any “prompt” is seen.

Agents amplify the blast radius. Once a model can write files, send emails, query databases or run code, textual jailbreaks become external actions. The classic confused‑deputy pattern shows up: the LLM, holding more authority than the attacker, executes operations on their behalf. Tool manifests and adapters sit in the supply chain; tampering there changes what the agent believes it can safely do.

Most interesting attacks chain stages. Poison a corpus, trigger an indirect injection, exploit lax tool schemas, then exfiltrate through an allowed channel. Point defences rarely compose. Model‑level safety and prompt hardening are probabilistic. Deterministic controls matter: least‑privilege tooling, strict typed interfaces, sandboxing, and pre‑retrieval access control provide auditable limits on what goes wrong when the model goes wrong.

Economics also bite. Availability attacks are practical: token floods, costly tool loops, and “denial‑of‑wallet” that pushes usage into expensive paths. None of this needs novel AI research to hurt.

Evaluation lags reality. Single‑turn success rates miss the long‑horizon, stateful failures that actually cause damage. Provenance‑aware RAG tests, agent benchmarks that track multi‑step plans, and clear reporting of attacker access and model versions would help. Closed, drifting deployments make reproducibility awkward.

So what for enterprises? Treat this as a threat‑modelling map for LLM systems. The commercially relevant point is simple: provenance, authority separation and deterministic enforcement pay off sooner than another round of prompt cleverness. The research agenda on compositional security and incident response is promising, but much of it is still academic. Patience required.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

Authors: Seyed Bagher Hashemi Natanzi and Bo Tang
Large language models are no longer only text generators. They are increasingly embedded in retrieval pipelines, enterprise assistants, coding environments, robotic systems, security-operation workflows, and autonomous agents that can read private data, call tools, write files, execute code, and act across organizational boundaries. This shift changes the security problem: risks do not arise from the model weights alone, but from the full lifecycle and application stack through which data, prompts, model outputs, tools, memories, and user authority interact. This paper systematizes the literature on vulnerabilities in large language model systems through a lifecycle and application-stack lens. We organize attacks across eight stages: data collection, pretraining, post-training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, and deployment/maintenance. For each stage, we analyze attacker capabilities, affected security objectives, representative attacks, practical risks, evaluation practices, and defenses. We further map LLM-specific vulnerabilities to confidentiality, integrity, availability, safety, privacy, fairness, accountability, and agency-control objectives. Unlike taxonomies that list isolated attack names, the proposed systematization emphasizes where trust boundaries fail, how untrusted data becomes executable instruction, how delegated authority amplifies model errors, and why point defenses rarely compose. We close with a research agenda for secure LLM systems, including compositional security, provenance-aware retrieval, tool-call containment, long-horizon agent evaluation, privacy-preserving adaptation, realistic red teaming, and deployment-grade incident response.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies vulnerabilities that arise when large language models are embedded in full application stacks rather than treated as isolated text generators. It argues that risks stem not only from model weights but from interactions across a lifecycle of data, prompts, memories, retrieval, tools, packaging, deployment and maintenance. This expanded surface matters because untrusted content can become executable instruction, delegated authority can amplify model errors, and point defences often fail to compose across layers.

Approach

The authors systematise literature into an eight-stage lifecycle: data collection, pretraining, post‑training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, and deployment/maintenance. They map attacks to security objectives beyond confidentiality, integrity and availability, adding safety, privacy, fairness, accountability and agency control. The survey codes representative papers by lifecycle stage, attacker capability, attack mechanism and defence family, and synthesises a defence‑in‑depth architecture that separates deterministic controls, model robustness, monitoring and governance.

Key Findings

  • Lifecycle expands the attack surface: vulnerabilities appear at eight distinct stages and the same technique (for example poisoning or prompt injection) has different implications depending on where it occurs.
  • RAG and memory convert content security into instruction security: retrieved or persistent documents can carry instructions that induce extraction, goal hijacking, tool misuse or exfiltration, so provenance and access control are critical.
  • Agents amplify impact via delegated authority: tool‑using and stateful agents can turn textual jailbreaks into external actions (emails, database writes, code execution), with confused‑deputy patterns enabling privilege escalation.
  • Many attacks are compositional: adversaries chain poisoning, indirect prompt injection, embedding manipulation and tool‑manifest tampering to bypass isolated defences.
  • Defences rarely compose: model‑level safety and prompt hardening are probabilistic and insufficient alone; deterministic controls like least‑privilege tools, typed schemas, sandboxing, and pre‑retrieval access control are necessary for auditable guarantees.
  • Evaluation gaps: existing metrics focus on single‑turn success rates; realistic risk requires long‑horizon, stateful benchmarks, provenance‑aware RAG tests, and reporting of attacker access, model versions and deployment context.

Limitations

The paper is a systematisation rather than an exhaustive bibliometric review and relies on a curated set of seed papers and practitioner frameworks. It focuses on vulnerabilities specific to LLM integration and does not reanalyse general web or cloud bugs except where LLMs change the attack path. Reproducibility is constrained by closed and drifting model deployments.

Implications

From an offensive perspective, attackers can exploit many pragmatic vectors: poison pretraining or fine‑tuning sets, insert poisoned documents into RAG corpora to trigger indirect prompt injection, manipulate embeddings to bias retrieval, tamper with adapters or tool manifests in the supply chain, craft prompt suffixes and encoding tricks to evade filters, and abuse delegated agent authority to perform unauthorised external actions. Availability attacks such as token floods, costly tool loops and denial‑of‑wallet are practical at scale. The most potent attacks chain stages so that defensive focus should prioritise provenance, authority separation and deterministic enforcement to limit attacker leverage.

// Similar research

Related Research

Get the weekly digest

The few AI-security papers that matter, with the practitioner takeaway. No spam.