Program Analysis Stops Prompt Injection in AI Agents

Defenses

Published: Mon, Aug 04, 2025 • By Dr. Marcus Halden

Program Analysis Stops Prompt Injection in AI Agents

AgentArmor treats an AI agent's runtime trace like a small program, analyzing data and tool calls to spot prompt injection. Tests show strong detection with high true positives and low false alarms, cutting attack success dramatically. Practical limits include dependency errors and extra runtime cost before enterprise deployment.

AI agents can be helpful and sneaky at the same time. New work called AgentArmor treats an agent's behavior like a little program and inspects it for malicious directions hidden inside data or responses. The idea is refreshingly simple: reconstruct what the agent actually did, mark which data and tools are sensitive, and check that nothing unsafe flows where it should not.

Practically, AgentArmor builds graph views of control and data flow, tags tools and data with security metadata, and runs a kind of type checking that flags suspicious flows. In tests on the AgentDojo benchmark the system reaches very high detection numbers, with reported true positive rates around 95.75 percent and false positive rates near 3.66 percent in the most balanced setting. Other runs show it can sometimes block attacks entirely, though at the cost of more false alarms.

Why this matters: prompt injection attacks try to trick agents into revealing secrets or running dangerous tools. AgentArmor gives defenders a way to enforce boundaries across tool calls and data movement rather than hoping heuristics catch a clever payload. That reduces unexpected actions and leaks, and it gives teams an auditable check on agent behavior.

Limits to watch: the system leans on model help to infer hidden dependencies, which can be wrong and double runtime cost. It does not protect against compromised tool binaries or model backdoors, and too many false positives can frustrate users.

Operational takeaways

Start by logging agent traces and tagging sensitive tools and data.
Use graph based checks to block clear policy violations before they reach tools.
Expect extra runtime cost and tune dependency rules to cut false positives.
Combine this with hardening of tools and model vetting; this is not a silver bullet.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to DefendAgainst Prompt Injection

Large Language Model (LLM) agents offer a powerful new paradigm forsolving various problems by combining natural language reasoning with theexecution of external tools. However, their dynamic and non-transparent behaviorintroduces critical security risks, particularly in the presence of promptinjection attacks. In this work, we propose a novel insight that treats theagent runtime traces as structured programs with analyzable semantics. Thus, wepresent AgentArmor, a program analysis framework that converts agent traces intograph intermediate representation-based structured program dependencyrepresentations (e.g., CFG, DFG, and PDG) and enforces security policies via atype system. AgentArmor consists of three key components: (1) a graphconstructor that reconstructs the agent's working traces as graph-basedintermediate representations with control flow and data flow described within;(2) a property registry that attaches security-relevant metadata of interactedtools & data, and (3) a type system that performs static inference and checkingover the intermediate representation. By representing agent behavior asstructured programs, AgentArmor enables program analysis over sensitive dataflow, trust boundaries, and policy violations. We evaluate AgentArmor on theAgentDojo benchmark, the results show that AgentArmor can achieve 95.75% of TPR,with only 3.66% of FPR. Our results demonstrate AgentArmor's ability to detectprompt injection vulnerabilities and enforce fine-grained security constraints.

🔍 ShortSpan Analysis of the Paper

Problem

Large language model agents combine natural‑language reasoning with external tool calls but behave dynamically and opaquely, creating acute security risks from prompt injection. Attackers can embed instructions in data or observations to make agents perform unintended actions, leak sensitive data or invoke dangerous tools. Existing heuristic defences struggle with adaptive or zero‑day attacks and lack system‑level guarantees.

Approach

The paper presents AgentArmor, which treats an agent's runtime trace as a structured program and applies program analysis. It reconstructs traces into control‑flow, data‑flow and program dependency graphs, annotates nodes via a property registry (tool and data metadata, plus runtime scanners), and enforces security via a type system that assigns, infers and checks confidentiality, integrity and trust types. A dependency analyser (LLM‑based) recovers implicit causal links; an automatic registry generator supports offline and runtime labelling; the framework is evaluated on the AgentDojo benchmark (97 tasks, 629 security test cases).

Key Findings

Evaluation on AgentDojo reduced attack success rate to 1.16% on average while utility dropped by about 6.7%.
An alternative reported run shows AgentArmor can block attacks completely (ASR 0%) with recall around 82.07%, though with a high false positive rate above 30% in that setting.
The system enables fine‑grained enforcement of dataflow and trust boundaries and detects prompt injection by checking types and policy rules over graph representations.

Limitations

AgentArmor relies on LLMs for dependency inference, which can misclassify control/data dependencies and double runtime cost. The data registry may grow large and mislabel after human interaction. It cannot defend against compromised tool binaries or model backdoors and is currently vulnerable to adaptive attacks that manipulate dependency reasoning.

Why It Matters

By applying program analysis to agent traces, AgentArmor offers a principled, verifiable path to reduce prompt injection and enforce policy at the level of tool invocations and data flows. The approach bridges dynamic LLM behaviour and static security guarantees, but practical adoption requires improved dependency accuracy and reduced false positives before deployment at scale.

Attribution Original paper on arXiv