LAAF automates multi-stage prompt injection against agents

Pentesting

Published: Thu, Mar 19, 2026 • By Theo Solander

LAAF automates multi-stage prompt injection against agents

New research introduces LAAF, an automated red‑teaming framework for Logic‑layer Prompt Control Injection in agentic Large Language Model systems. It couples a 49‑technique taxonomy with stage‑by‑stage payload mutation and reports an 84% average breakthrough rate across five production platforms. Layered and semantic attacks succeed most; exfiltration is confirmed in controlled tests.

Agent-style Large Language Model (LLM) systems now stitch together memory, retrieval and external tools. That convenience opens an old kind of risk in a new suit: logic-layer prompt control injection, where an attacker places instructions that live inside the system’s context rather than at the output. Filters at the final step do little if the logic upstream has already been bent.

A new framework, Logic-layer Automated Attack Framework or LAAF, targets that gap. It treats Logic-layer Prompt Control Injection, or LPCI, as a lifecycle problem and automates red-teaming accordingly. The authors assemble a 49-technique taxonomy across six categories: Encoding, Structural, Semantic reframing, Layered combinations, Trigger, and Exfiltration. Each technique has up to five variants and is exercised across six stages: reconnaissance, logic-layer injection, trigger execution, persistence or reuse, evasion or obfuscation, and trace tampering. A payload generator uses SHA-256 deduplication to avoid repeats.

The centrepiece is a Persistent Stage Breaker that mutates a winning payload to seed the next stage. If a tack fails, the system escalates via seed variation, encoding mutation or compound mutation, with some fresh samples mixed in to avoid local traps. It is a practical model of how real attackers behave: build a foothold, learn what passes, and then ratchet forward rather than starting from scratch each time.

What LAAF finds

Tested against five production LLM endpoints via the OpenRouter API, with up to 100 attempts per stage across three independent runs, LAAF reports a mean aggregate breakthrough rate of 84 percent, with a range from 83 to 86 percent. Platform-level rates were stable within 17 percentage points across runs. Two platforms reached 100 percent breakthrough across all six stages. Gemini required only nine attempts in total. Mixtral reached 100 percent, while LLaMA3 resisted the evasion stage within the attempt budget. ChatGPT showed stronger output filtering with some stages resisting within the budget.

Layered combinations and semantic reframing were the most effective technique groups, and layered payloads outperformed encoding on well-defended platforms. Exfiltration techniques produced confirmed data leakage in tests against researcher-controlled endpoints and contributed a notable share of winning vectors.

Patterns we have seen before

If this feels familiar, it is. Early macro malware lived inside documents and templates, hitching a ride through the workflows people trusted. Web injection attacks learned to survive filters through obfuscation, layering and semantic games. LAAF’s stage-seeded mutations echo the way intrusion kits would probe, adjust and advance once they discovered a workable path. The technology changes; the rhythm of escalation does not.

The limits here are clear. The evaluation did not run against live retrieval pipelines or real memory interfaces end to end, so the results demonstrate model-level susceptibility when payloads appear in context rather than full exploitability through a production vector store. Findings reflect a single evaluation window and specific model versions, and per-category effectiveness is indicative rather than independently benchmarked. The authors released the tool as dual use under the Apache 2.0 licence with responsible-use guidance.

Even so, the operational message is hard to ignore. Security work that fixates on output filters will miss logic-layer manipulation that persists across sessions. Lifecycle-aware testing should sit in continuous integration. Cross-session integrity checks, stronger safeguards around retrieval and memory, validation around tool connectors, and monitoring for staged triggers and exfiltration patterns follow directly from the data. History suggests we will adapt: once testing frameworks make the failure modes visible, defenders usually learn to close the loop higher up the stack.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Authors: Hammad Atta, Ken Huang, Kyriakos Rock Lambros, Yasir Mehmood, Zeeshan Baig, Mohamed Abdur Rahman, Manish Bhatt, M. Aziz Ul Haq, Muhammad Aatif, Nadeem Shahzad, Kamal Noor, Vineeth Sai Narajala, Hazem Ali, and Jamel Abed

Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated Attack Framework), the first automated red-teaming framework to combine an LPCI-specific technique taxonomy with stage-sequential seed escalation - two capabilities absent from existing tools: Garak lacks memory-persistence and cross-session triggering; PyRIT supports multi-turn testing but treats turns independently, without seeding each stage from the prior breakthrough. LAAF provides: (i) a 49-technique taxonomy spanning six attack categories (Encoding~11, Structural~8, Semantic~8, Layered~5, Trigger~12, Exfiltration~5; see Table 1), combinable across 5 variants per technique and 6 lifecycle stages, yielding a theoretical maximum of 2,822,400 unique payloads ($49 \times 5 \times 1{,}920 \times 6$; SHA-256 deduplicated at generation time); and (ii) a Persistent Stage Breaker (PSB) that drives payload mutation stage-by-stage: on each breakthrough, the PSB seeds the next stage with a mutated form of the winning payload, mirroring real adversarial escalation. Evaluation on five production LLM platforms across three independent runs demonstrates that LAAF achieves higher stage-breakthrough efficiency than single-technique random testing, with a mean aggregate breakthrough rate of 84\% (range 83--86\%) and platform-level rates stable within 17 percentage points across runs. Layered combinations and semantic reframing are the highest-effectiveness technique categories, with layered payloads outperforming encoding on well-defended platforms.

🔍 ShortSpan Analysis of the Paper

Problem

Agentic large language model systems that include persistent memory, retrieval-augmented generation pipelines, and external tool connectors are vulnerable to a class of attacks called logic-layer prompt control injection (LPCI). LPCI payloads can be encoded, stored in memory or vector stores, and triggered conditionally across sessions, bypassing conventional output-layer filters. Prior automated red-teaming tools did not model LPCI’s defining properties: memory persistence, cross-session triggering, layered encoding, semantic reframing, and staged lifecycle escalation. This paper addresses the need for an automated, lifecycle-aware red-teaming instrument tailored to LPCI.

Approach

The authors present LAAF, an automated red-teaming framework combining a 49-technique taxonomy with a Persistent Stage Breaker PSB that seeds subsequent lifecycle stages with mutated winning payloads. The taxonomy spans six categories: Encoding (11 techniques), Structural (8), Semantic reframing (8), Layered combinations (5), Trigger (12), and Exfiltration (5). Each technique has up to five variants and is applied across six lifecycle stages: reconnaissance, logic-layer injection, trigger execution, persistence/reuse, evasion/obfuscation, and trace tampering. LAAF's architecture includes a payload generator with SHA-256 deduplication, a stage engine, an attack executor for API interaction, a deterministic response analyser classifying outcomes as EXECUTED, BLOCKED, WARNING, or UNKNOWN, a results logger, and a mutation engine that applies adaptive strategies to seed the next stage. PSB mutation strategies escalate based on consecutive failures using seed variation, encoding mutation, or compound mutation, with random fresh samples interspersed to avoid local traps. Evaluation targeted five production LLM endpoints via the OpenRouter API with up to 100 attempts per stage and three independent runs to estimate variance.

Key Findings

LAAF achieved a mean aggregate stage-breakthrough rate of 84 percent across three runs (reported range 83 to 86 percent), with platform-level rates stable within 17 percentage points across runs.
Layered combinations and semantic reframing were the most effective technique categories; layered payloads outperformed encoding on better defended platforms.
Two platforms reached 100 percent breakthrough across all six stages; platform-specific behaviour varied: Gemini required only nine attempts in total, Mixtral reached 100 percent while LLaMA3 resisted the evasion stage within the 100-attempt budget, and ChatGPT showed stronger output filtering with some stages resisting within the budget.
Exfiltration techniques produced confirmed data-leakage behaviour in tests against researcher-controlled endpoints and accounted for a notable share of winning vectors.
PSB-guided adaptive mutation achieved higher stage coverage and fewer attempts per stage than single-technique random testing, demonstrating the value of stage-sequential seeding and multi-technique combinations.

Limitations

The evaluation did not exercise end-to-end RAG retrieval pipelines or real memory interfaces, so it demonstrates model-level susceptibility when payloads appear in context but does not prove full AV-2 or AV-4 exploitability through a live vector store. Results derive from a single evaluation date and specific model versions, and per-category effectiveness estimates are indicative rather than independently benchmarked. The comparison to a prior manual baseline is limited by methodological, budgetary, and platform-version non-equivalence. The tool is dual use and was released openly under Apache 2.0 with responsible-use guidance.

Why It Matters

LAAF shows that automated, multi-stage LPCI testing uncovers high-risk vulnerabilities across production LLMs and that semantic reframing and layered obfuscation can defeat current defences. Practical implications include the need for lifecycle-aware testing in CI pipelines, cross-session integrity checks, stronger RAG and memory safeguards, tool-connector validation, and active monitoring for staged payloads and exfiltration patterns. The work informs threat modelling, defensive priorities, and the development of runtime logic validation beyond output filtering.

Links Original paper on arXiv

LAAF automates multi-stage prompt injection against agents

What LAAF finds

Patterns we have seen before

📋 Original Paper Title and Abstract

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Why It Matters

Related Articles

LLM Agents Tackle Lateral Movement, Still Brittle

Agents Leak Secrets via Web Search Tools

Adaptive Attacks Routinely Bypass Modern LLM Defences

Related Research

Get the Weekly AI Security Digest