Rebuilding Agent Flows to Catch LLM Attack Chains

Agents

Published: Sat, Mar 07, 2026 • By Theo Solander

Rebuilding Agent Flows to Catch LLM Attack Chains

Researchers propose MAScope, an execution-aware defence for multi‑agent systems. It rebuilds cross‑agent semantic flows and uses a Supervisor Large Language Model (LLM) to flag data‑flow leaks, control‑flow deviations and intent drift. In tests it detected compound attacks with strong node‑level F1 85.3% and workable path‑level F1 66.7%, outperforming a baseline.

Multi‑agent systems are fast becoming the scaffolding for long, messy jobs that a single Large Language Model (LLM) struggles to complete. They delegate, call tools and pass notes between themselves in free text. That informality is liberating, but it also invites trouble. Indirect prompt injection, memory poisoning and quiet privilege hops do not announce themselves in any single message. They show up as a pattern across steps.

Security has been here before. Early network filters watched packets, then attackers spread themselves across sessions. Defenders responded by reconstructing conversations and following provenance rather than single events. The same rhyme is now playing in agent land.

What the researchers built

The paper introduces MAScope, an execution‑aware framework that reconstructs cross‑agent semantic flows into contiguous behavioural trajectories and audits them. It starts with telemetry collection across the application layer and the kernel, aligning the logs in time. A hierarchical model of sensitive entities extracts what matters from the chatter and tool calls, then stitches these fragments into a semantic provenance graph.

On top sits a Supervisor LLM that examines candidate paths along three policy lines: data‑flow confidentiality, control‑flow integrity and intent consistency. The system ranks paths with a weighted risk score combining entity sensitivity, operation type and the trust level of targets. The aim is to move from static, per‑message filtering to a view that understands what the agents actually did.

In evaluation, MAScope detected compound, multi‑stage attacks that spanned agents. The authors reproduced scenarios inspired by common web risks and long‑horizon tasks. Examples included phishing propagation, credential theft via generated scripts, privilege abuse mediated by an intermediary agent and a rogue agent exfiltrating database records. At node level, MAScope achieved precision of 96.5%, recall of 76.4% and F1 of 85.3%, far ahead of a simple baseline with F1 of 21.9%. Path‑level detection was harder but useful, with precision of 81.3% and F1 of 66.7%.

The sensitive‑entity extractor did a lot of the heavy lifting. Using the hierarchical constraints, overall F1 for entity extraction rose from roughly the high‑40s to the mid‑70s across two backbone models. Better entities meant cleaner graphs and more reliable path scoring.

There is a familiar historical cadence here. When operating systems and browsers faced control‑flow hijacks, defenders added integrity checks that looked beyond inputs to the sequence of actions. Later, incident responders learned to follow provenance across machines to untangle long‑dwell intrusions. MAScope applies that lesson to agents: the story is in the sequence.

Practical edges and gaps

There are caveats. MAScope’s effectiveness depends on high‑quality, cross‑layer telemetry and accurate alignment; gaps or drift in logs will blunt reconstruction. The framework relies on foundation models for extraction, privilege estimation and supervision, so it inherits their limits, including occasional misclassification or brittle reasoning. Conservative path consolidation helps precision but can trim recall. The authors also flag scalability challenges when many agents and tools operate at once.

For teams experimenting with agentic workflows, the work suggests a pragmatic direction of travel:

Instrument both application‑level events and host telemetry so you can rebuild execution paths.
Define a hierarchical taxonomy of sensitive entities to anchor provenance.
Subject reconstructed trajectories to policy checks for data‑flow, control‑flow and intent, not just content filters.

Source code is available for evaluation, which should help practitioners test the approach against their own orchestration patterns. No silver bullet, but a solid step: treat agents as a distributed system and judge them by what they do, not only by what they say. History, once again, points to watching the flow.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

Authors: Yangyang Wei, Yijie Xu, Zhenyuan Li, Xiangmin Shen, and Shouling Ji

Multi-Agent System is emerging as the \textit{de facto} standard for complex task orchestration. However, its reliance on autonomous execution and unstructured inter-agent communication introduces severe risks, such as indirect prompt injection, that easily circumvent conventional input guardrails. To address this, we propose \SysName, a framework that shifts the defensive paradigm from static input filtering to execution-aware analysis. By extracting and reconstructing Cross-Agent Semantic Flows, \SysName synthesizes fragmented operational primitives into contiguous behavioral trajectories, enabling a holistic view of system activity. We leverage a Supervisor LLM to scrutinize these trajectories, identifying anomalies across data flow violations, control flow deviations, and intent inconsistencies. Empirical evaluations demonstrate that \SysName effectively detects over ten distinct compound attack vectors, achieving F1-scores of 85.3\% and 66.7\% for node-level and path-level end-to-end attack detection, respectively. The source code is available at https://anonymous.4open.science/r/MAScope-71DC.

🔍 ShortSpan Analysis of the Paper

Problem

Multi-Agent Systems (MAS) increasingly orchestrate complex, long-horizon tasks but expand the attack surface because agents communicate via unstructured messages and perform autonomous execution. Conventional input guardrails and sandboxing focus on single-message filtering and fail to detect multi-step, distributed exploits such as indirect prompt injection, memory poisoning, privilege escalation and tool misuse. Attacks often manifest only when fragmented micro-operations across agents are viewed as a continuous execution trace, so a defence that reasons about end-to-end execution is required.

Approach

The paper introduces MAScope, an execution-aware framework that reconstructs Cross-Agent Semantic Flows into contiguous behavioural trajectories and uses a Supervisor LLM to audit them. MAScope has three modules: a Data Collection Module that gathers application-layer and kernel-level logs and aligns them temporally; a Semantic Extraction and Flow Reconstruction Module that applies a Hierarchical Sensitive Entity Constraint (HSEC) to extract sensitive entities and operational primitives and to build a semantic provenance graph; and a Trajectory Scrutiny Module where a foundation model evaluates candidate paths against three policy dimensions: data flow confidentiality, control flow integrity and intent consistency. The system prioritises provenance paths by a weighted risk score that combines entity sensitivity, operation type and trustworthiness of targets. Evaluation employed simulated MAS scenarios covering ten real-world use cases and reproductions of OWASP Top 10 attack surfaces, using ChatGPT-5.2 and Gemini 3 as backbone models and a VanillaGPT baseline for comparison.

Key Findings

MAScope detects compound, multi-stage attacks spanning agents; authors reproduce over ten distinct combined attack vectors drawn from OWASP-style scenarios.
Sensitive entity extraction improves markedly with HSEC: for GPT-5.2 the overall F1 rises to 76.8% from 49.4% baseline, and for Gemini 3 to 75.7% from 48.2%.
End-to-end detection performance is strong at node granularity: MAScope achieves an overall precision of 96.5%, recall 76.4% and F1 of 85.3% for node-level detection, substantially outperforming the VanillaGPT baseline (precision 23.1%, recall 20.8%, F1 21.9%).
Path-level detection is more challenging but effective: MAScope attains path-level precision 81.3% and F1 66.7%.
Case studies demonstrate practical coverage: phishing propagation, credential theft via generated scripts, privilege-abuse via intermediary agents, and rogue-agent induced database exfiltration were all reconstructed and flagged by MAScope while the baseline missed coordinated behaviours.

Limitations

MAScope depends on the quality of flow reconstruction and cross-layer observation; incomplete or misaligned telemetry can reduce effectiveness. The approach relies on foundation models for extraction, privilege estimation and supervision, which inherit limitations such as hallucination and imperfect reasoning. The authors note trade-offs from conservative path consolidation that may lower recall and highlight scalability challenges when applying fine-grained reconstruction and verification in large-scale agentic deployments.

Why It Matters

MAScope shifts defence from static input filtering to execution-aware monitoring that links semantics, control and intent across agents. For security practitioners, this demonstrates a practical path to detect multi-stage, compositional attacks that evade per-message guardrails, improving incident detection and governance in agentic ecosystems. The design also underscores the need for rich telemetry, hierarchical entity modelling and supervised trajectory analysis as core components of MAS security.

Links Original paper on arXiv

Rebuilding Agent Flows to Catch LLM Attack Chains

What the researchers built

Practical edges and gaps

📋 Original Paper Title and Abstract

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Why It Matters

Related Articles

Framework curbs agentic LLM risks in enterprise SOC

Study maps agentic AI attack surface and risks

Stateful monitoring catches distributed LLM agent attacks

Related Research

Get the Weekly AI Security Digest