ShortSpan.ai logo

Argument-level provenance locks down LLM agent calls

Agents
Published: Tue, May 12, 2026 • By Elise Veyron
Argument-level provenance locks down LLM agent calls
New research tests PACT, a runtime monitor for Large Language Model (LLM) agents that enforces trust at the level of each tool argument. By tracking provenance and binding authority-bearing fields to role-specific contracts, it blocks prompt-injection hijacks while preserving utility. Results show strong security, modest overhead, and clear deployment limits.

LLM agents do useful work when they read the web, parse email, and then act via tools with real privileges. They also break in predictable ways. Most current defences gate an entire tool call, which forces a crude choice: let mixed-trust content steer the call and risk hijack, or quarantine and kill benign retrieve-then-act behaviour. The paper’s punchline is simple and uncomfortable: the danger is not untrusted text in context; it is when that text binds an authority-bearing argument like a recipient, URL, command, file path, or credential.

Why this attack works

Think of an agent that fetches a page and then sends an email. If the page quietly sets the “to” field, you have redirection. An invocation-level monitor sees a legitimate “send_email” call and waves it through, or blocks every call that touched the web and wrecks utility. The granularity mismatch hides the actual exploit surface: argument binding.

What PACT changes

PACT (Provenance-Aware Capability Contracts) instruments the boundary where authority is exercised. It assigns semantic roles to each tool argument (target, command, credential, content, selector, control), tags runtime values with their provenance across replanning and tool-call chains, and checks each role against a trust contract before execution. Contracts span precision levels L0 to L3; as roles are exposed, utility rises without giving up security under oracle provenance.

With oracle provenance, PACT at L2 hits 100% benign utility and 100% security on a mixed-trust diagnostic suite, while whole-call monitors either over-block or miss attacks. Ablations matter: remove semantic roles and benign utility collapses; drop cross-step provenance and security falls.

Deployed in AgentDojo across five models, PACT reaches 100% security on the three strongest models and recovers 38.1–46.4% utility, outperforming a quarantining baseline (CaMeL) by 8–16 percentage points at comparable high-security settings. Runtime cost is small: across 100k checks, P99 latency stays below 272 microseconds with 13.6k–18.0k checks per second. The gap to oracle stems from upstream fidelity limits: contract-role accuracy at 87.1% and provenance inference at 77.4% on evaluated tools.

This framing clarifies the attacker’s goal: get untrusted content to bind an authority-bearing argument and you can redirect messages, exfiltrate, overwrite files, or run shell commands. The most credible routes now target provenance and contracts rather than payload cleverness: laundering through intermediate tools, over-labelling output trust, dropping malicious text inside permitted content fields, or inducing role mislabelling. PACT does not stop content-channel harm inside allowed fields, nor tool-selection attacks where the model chooses a dangerous tool with trusted constants.

Policy people should pay attention to this authority-binding lens. It suggests audits and procurement need argument-level provenance and contract definitions, not just logs of tool calls. The hard part is now squarely on provenance inference and contract synthesis. That is a tractable place for standards work, but only if we measure and harden those components under red-team pressure.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

Authors: Linfeng Fan, Ziwei Li, Yuan Tian, Yichen Wang, Rongsheng Li, and Xiong Wang
Tool-using LLM agents must act on untrusted webpages, emails, files, and API outputs while issuing privileged tool calls. Existing defenses often mediate trust at the granularity of an entire tool invocation, forcing a brittle choice in mixed-trust workflows: allow external content to influence a call and risk hijacked destinations or commands, or quarantine the call and block benign retrieval-then-act behavior. The key observation behind this paper is that indirect prompt injection becomes dangerous not when untrusted content appears in context, but when it determines an authority-bearing argument. We present \textsc{PACT} (\emph{Provenance-Aware Capability Contracts}), a runtime monitor that assigns semantic roles to tool arguments, tracks value provenance across replanning steps, and checks whether each argument's origin satisfies its role-specific trust contract. Under oracle provenance, \textsc{PACT} achieves 100\% utility and 100\% security on mixed-trust diagnostic suites, while flat invocation-level monitors incur false positives or false negatives. In full AgentDojo deployments across five models, \textsc{PACT} reaches 100\% security on the three strongest models while recovering 38.1--46.4\% utility, 8--16 percentage points above CaMeL at the same security level. Ablations show that both semantic roles and cross-step provenance are necessary. \textsc{PACT} reframes agent security as authority binding, and isolates the remaining deployment bottleneck to provenance inference and contract synthesis.

🔍 ShortSpan Analysis of the Paper

Problem

Tool-using large language model agents must read untrusted webpages, emails, files and API outputs while making privileged tool calls. Existing defences typically grant or deny entire tool invocations, which forces a brittle tradeoff in mixed-trust workflows: either allow external content to influence a call and risk hijacked destinations or commands, or quarantine the call and block benign retrieval-then-act behaviour. The paper identifies the key failure mode: indirect prompt injection is dangerous not when untrusted text merely appears in context but when it determines an authority-bearing argument such as a recipient, URL, command, file path or credential.

Approach

The authors introduce PACT (Provenance-Aware Capability Contracts), a runtime monitor that assigns semantic roles to each tool argument (for example target, command, credential, content, selector, control), tracks provenance of argument values across replanning and tool-call chains, and enforces role-specific trust contracts before each tool invocation. Contracts are organised by precision levels L0 to L3, from opaque whole-call checks to certified routing with scoped discharge procedures. Runtime values carry provenance tags recording contributing sources, trust level and unresolved obligations; checks compare these against the argument role requirements. For deployment the system synthesises contracts from tool schemas and infers provenance using structural matching, heuristics and an LLM classifier, failing closed when ambiguous.

Key Findings

  • Under oracle provenance, PACT at L2 achieves 100% benign utility and 100% security on a 17-scenario mixed-trust diagnostic suite; whole-call monitors incur false positives or false negatives.
  • Contract precision is monotonic: security stays at 100% under oracle provenance across L0–L3 while utility rises from low at L0 to full recovery at L2/L3 as argument roles are exposed.
  • In full AgentDojo deployments across five models, PACT attains 100% security on the three strongest models and recovers 38.1–46.4% utility, outperforming the quarantining baseline CaMeL by 8–16 percentage points at comparable high-security levels.
  • Ablations show both semantic roles and cross-step provenance are necessary: removing roles collapses benign utility; removing cross-step origin tracking reduces security markedly.
  • Runtime cost is small: across 100k checks PACT has P99 latency below 272 microseconds and throughput 13.6k–18.0k checks per second. Deployment fidelity limits remain in contract-role accuracy (87.1%) and provenance inference (77.4%) on evaluated tools.

Limitations

PACT’s formal guarantee assumes conservative provenance propagation and correctly specified contracts. The main practical constraints are imperfect automatic contract synthesis and provenance inference; these upstream errors produce the oracle-to-deployment gap. PACT does not address content-channel attacks where harmful text appears inside permitted content fields, nor tool-selection attacks where the model invokes dangerous tools with trusted constants. Semantic ambiguity of argument interfaces can force conservative blocking and reduce utility. Evaluation focuses on several model families and selected benchmarks; broader validation on other agents remains future work.

Implications

Framing agent security as authority binding clarifies offensive opportunities. An attacker can aim to have untrusted content bind authority-bearing arguments to redirect messages, exfiltrate data, overwrite files, issue shell commands or schedule unintended actions. Given PACT’s deployment limitations, the most promising attack avenues are targeting provenance-inference and contract-synthesis components, exploiting provenance-laundering through intermediate tools, over-labelling of output trust, or placing malicious instructions inside otherwise permitted content fields. Adaptive surface-level payloads are less effective if provenance is preserved, but adversaries can still attempt to subvert provenance tags or induce semantic role mislabelling. These weaknesses identify concrete red team targets for improving provenance fidelity and contract design.


Related Articles

Related Research

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.