ShortSpan.ai logo

Structured Template Injection Hijacks LLM Agents at Scale

Agents
Published: Sat, Feb 21, 2026 • By Natalie Kestrel
Structured Template Injection Hijacks LLM Agents at Scale
New research shows automated structural attacks against Large Language Model (LLM) agents beat semantic prompt hacks by targeting chat templates. Phantom induces role confusion via structured template injection, hitting 79.76% attack success across closed-source agents and over 70 vendor-confirmed product flaws. Basic defences struggle; stronger models are more vulnerable. The authors call for architectural changes.

Most prompt injection work tries to sweet-talk a Large Language Model (LLM) into misbehaving. Phantom, a new framework for automating agent hijacking, does something colder: it speaks the agent’s native protocol. By crafting structured templates that look like system, user, assistant or tool turns, it creates role confusion and steers the agent without arguing about intent.

The paper targets the chat templates that sit between an agent’s orchestration layer and the LLM. Those templates serialise roles and tool calls into a single token stream. Phantom injects optimised, structured sequences into retrieved content so the agent misreads the payload as legitimate instructions or prior tool output. The result can be privilege escalation, data exfiltration or remote code execution when tools are wired in.

What Phantom actually breaks

Instead of hand-crafted jailbreaks, Phantom automates the search for high-impact structures. It augments seed templates for diversity, trains a Template Autoencoder to embed template triplets into a continuous space, then uses Bayesian optimisation to find vectors that decode into potent adversarial templates. A lightweight proxy evaluation detects role confusion to keep query costs manageable against black-box targets.

On numbers, the framework reports an aggregate attack success rate of 79.76% across seven closed-source agents, beating single-template baselines at 54.09% and semantic injection methods near 40%. The attacks transfer across major model families, including Qwen, GPT and Gemini, and hold up on a representative open-source model. In the wild, testing against 942 commercial agents surfaced over 70 distinct vulnerabilities that vendors confirmed, including a systemic issue in open-source Model Context Protocol implementations and a privilege-escalation path in a commercial platform.

Defences do not come out well. Instruction-based safeguards and simple sanitisation around delimiters or tags made little difference in this setting. Strong semantic detectors helped, but at the cost of substantial utility loss. The attack appears to rely on structural priors rather than fragile token IDs, with success surviving HTML encoding and forced tokenisation. The authors also flag a grim trend: models that follow instructions more capably tended to be easier to bend structurally.

Where the evidence stops

The threat model is a remote, iterative adversary over text. The study does not test multimodal agents, insider access to templates, strict rate limiting, or anomaly detection. It also does not examine long-term persistence of injected state across complex conversations. The proxy evaluation depends on a reasonable query budget and assumes benign proxies correlate with target behaviour. Practitioners will ask how the method fares when templates change frequently, or when orchestration layers hard-fail on malformed turns. The paper argues transferability despite encoding perturbations, but operational churn and aggressive telemetry could still blunt real-world impact.

So what would a determined attacker do with this? Use retrieval to shuttle structured payloads into the agent, flip roles, and issue high-privilege actions as if they came from a tool or the system prompt. The authors’ vendor-confirmed findings show that is not hypothetical. At scale, any agent that stitches together retrieval and tools without isolating control tokens from untrusted content is exposed.

The direction of travel is clear. If role boundaries live only in a serialised string, attackers will learn to speak that dialect. The paper’s own implications are blunt: harden template handling, enforce strict separation between control tokens and user data, and add template-aware detection. Some form of formal or hardware-backed isolation may be needed. As agents get more capable, the structural risk goes up, not down. That is the buried lede.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Automating Agent Hijacking via Structural Template Injection

Authors: Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, and Qi Li
Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies agent hijacking in LLM-based agents, where adversaries inject malicious instructions into retrieved content to manipulate execution. Existing indirect prompt injection attacks focus on semantics and often fail to transfer to closed-source commercial agents or to bypass semantic alignment. The authors argue a deeper architectural weakness exists: chat templates and special tokens used to separate system, user, assistant and tool content are serialized into a single token stream, allowing carefully crafted token sequences to blur role boundaries and induce role confusion, with potentially severe outcomes such as privilege escalation, data exfiltration and remote code execution.

Approach

The authors present Phantom, an automated framework for Structured Template Injection that targets chat-template parsing rather than semantic persuasion. Phantom has three main components: multi-level template augmentation to produce a diverse seed pool, a Template Autoencoder (TAE) that maps discrete template triplets into a continuous low-dimensional latent space using a lightweight LLM backbone with LoRA fine-tuning, and Bayesian optimisation over that latent manifold to search for high-potency adversarial templates. To reduce query cost and risks against black-box targets, Phantom uses a lightweight proxy evaluation that detects role confusion via a round-counter protocol. The optimisation uses a Random Forest surrogate with an Expected Improvement acquisition function and deterministic decoding during search; successful latent vectors are decoded into structured templates which are injected into retrieval content.

Key Findings

  • Automated structural attacks are highly effective: Phantom achieved an aggregate Attack Success Rate (ASR) of 79.76% across seven state-of-the-art closed-source agents, outperforming Single-Template (54.09%), Semantic-Injection (39.86%) and ChatInject (38.46%).
  • Structural attacks transfer across major families: evaluations included Qwen, GPT and Gemini models and a representative open-source model, showing broad transferability.
  • Real-world impact validated: large-scale testing on 942 commercial agents found over 70 distinct vulnerabilities confirmed by vendors, including a systemic flaw in open-source Model Context Protocol implementations and a privilege-escalation exploit against a commercial platform.
  • Robustness to defences: Phantom largely bypasses instruction-based safeguards and simple sanitisation. It sustained high ASR under delimiter-focused instruction defences and tag filters, though strong semantic detectors reduce ASR at the cost of substantial utility loss.
  • Capability curse: more capable instruction-following models tended to be more susceptible to structural injection, increasing risk as agents grow stronger.
  • Structural rather than token-specific dependency: perturbations like HTML encoding or forced tokenisation preserve high success rates, indicating exploitation of general syntactic priors rather than exact token identifiers.

Limitations

The study focuses on text-only interactions and a black-box remote adversary with iterative query access. It does not address multimodal agents, insiders with template knowledge, strict rate-limiting or anomaly detection, nor long-term persistence of payloads in complex conversational state. The proxy evaluation and optimisation assume access to sufficient query budget and benign evaluation proxies.

Why It Matters

Phantom demonstrates a fundamental class of syntactic attacks that evade semantic alignment by exploiting chat-template parsing. Practical implications include a need to harden template handling, enforce strict architectural separation between control tokens and untrusted content, deploy robust sanitisation and template-aware detection, and consider formal or hardware-backed token isolation. The work highlights the urgency of redesigning agent architectures and benchmarking structural resilience as agents are deployed in security-critical environments.


Related Articles

Related Research on arXiv

Get the Monthly AI Security Digest

Top research and analysis delivered to your inbox once a month. No spam, unsubscribe anytime.

Subscribe