Contain AI Agents with Declarative Access Controls

Defenses

Published: Mon, Oct 27, 2025 • By Theo Solander

Contain AI Agents with Declarative Access Controls

Researchers introduce AgentBound, an access-control layer for Model Context Protocol (MCP) servers that wraps AI agents in a least-privilege container. Automated manifests reach about 80.9% accuracy, the enforcement adds negligible latency, and the system blocks most environment-based attacks. Puppet-style manipulations of tool handling remain an unresolved vector.

We have seen this pattern before. New runtimes arrive with shiny capabilities, defaults set to convenience, and a few high-profile escapes teach the industry to lock things down. Large Language Models (LLMs) are following the same arc as web plugins and early mobile apps: they gain the ability to call tools and touch host systems, and that unrestricted reach becomes a liability.

The paper presents AgentBound, a pragmatic access-control framework for MCP, the Model Context Protocol that connects agents to external tools and environments. AgentBound pairs a declarative manifest, the AgentManifest, with a runtime enforcement container, AgentBox, so MCP servers can start with no privileges and receive only the permissions they declare. That Android-style permission model is a familiar correction: stop trusting defaults, and make the intended accesses explicit.

The authors built a dataset of the 296 most popular MCP servers and produced an automated manifest generator, AgentManifestGen. Across reviewed manifests the tool achieves 80.9% accuracy. In further comparisons it reports high recall for capturing needed permissions and precision/recall around 0.94/0.96 against manually written manifests. Those numbers are not perfect, but they are good enough to be useful: the manifests provide a practical starting point rather than a final audit.

On the security front, AgentBox blocks the majority of environment-oriented attacks the authors tested, including attempts at external resource abuse and data exfiltration. Some limits remain. Notably, puppet attacks that manipulate tool handling inside the model cannot be stopped by system-level containment alone. Runtime cost is negligible: representative operations add roughly 0.6 ms on macOS and 0.29 ms on Debian, while sandbox startup ranges from about 150 ms to 400 ms depending on host.

What teams should do now

History offers a clear through-line: adopt least privilege early, automate what you can, and require humans for the rest. Practically that means integrating manifest generation into your build and CI pipelines so every MCP server has a candidate AgentManifest. Treat that document as code but also as policy: have engineers and security reviewers sign off before granting persistent privileges.

Use runtime containment like AgentBox or equivalent container mechanisms so servers boot with minimal rights and only receive runtime refinements after explicit consent. Combine system-level controls with tool-level safeguards: validate and sanitise tool inputs, wrap sensitive tools in deterministic interfaces, and log tool usage for rapid incident triage. Finally, test for the gap the paper flags most clearly: puppet attacks. Simulate adversarial prompts that try to subvert tool handling and instrument your agent framework to detect anomalous command sequences.

AgentBound does not claim to be a panacea. It reduces the attack surface significantly and fits into existing workflows without modifying servers, but it requires governance. Automated manifests speed deployment, yet they must be reviewed. Runtime approval steps protect data but add friction. Expect trade-offs between productivity and safety, and plan for them.

In short, the contribution is useful and practical: declarative manifests plus lightweight containment buys you measurable security at low cost. The next phase is organisational: bake manifest review into development practices, pair containment with tool-level checks, and treat puppet-style manipulation as a live threat requiring separate controls. That is how this familiar cycle of risk and response moves from ad hoc firefighting to durable hygiene.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Securing AI Agent Execution

Authors: Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi

Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model Context Protocol (MCP) has become the de facto standard for connecting agents with such resources, but security has lagged behind: thousands of MCP servers execute with unrestricted access to host systems, creating a broad attack surface. In this paper, we introduce AgentBound, the first access control framework for MCP servers. AgentBound combines a declarative policy mechanism, inspired by the Android permission model, with a policy enforcement engine that contains malicious behavior without requiring MCP server modifications. We build a dataset containing the 296 most popular MCP servers, and show that access control policies can be generated automatically from source code with 80.9% accuracy. We also show that AgentBound blocks the majority of security threats in several malicious MCP servers, and that policy enforcement engine introduces negligible overhead. Our contributions provide developers and project managers with a practical foundation for securing MCP servers while maintaining productivity, enabling researchers and tool builders to explore new directions for declarative access control and MCP security.

🔍 ShortSpan Analysis of the Paper

Problem

Large language models now act as AI agents that interact with external tools and environments through the Model Context Protocol MCP. Security has lagged behind, with thousands of MCP servers running with unrestricted access to host systems, creating a broad attack surface. This paper presents AgentBound, the first access control framework for MCP servers. It combines an Android style declarative policy mechanism with a policy enforcement engine that can contain malicious behaviour without requiring MCP server modifications. The work also builds a dataset of the 296 most popular MCP servers and shows that access control policies can be generated automatically from source code with 80.9 per cent accuracy. It demonstrates that AgentBound blocks the majority of security threats in several malicious MCP servers and that the policy enforcement engine adds negligible overhead. The contributions offer developers and project managers a practical basis for securing MCP servers while preserving productivity and enable researchers to explore declarative access control and MCP security.

Approach

AgentBound comprises two core elements: an access control policy mechanism that enables declarative permissions for MCP servers, and a policy enforcement engine that runs at runtime to enforce those permissions. The policy mechanism uses an AgentManifest written in a JSON style format that declares the resources a server may access, such as files, networks, or secrets. An enforcement container, AgentBox, wraps each MCP server in an isolated environment so servers can start with no privileges and may be granted only the declared permissions. The framework does not require changes to existing MCP servers. Permissions are organised into five categories including filesystem access, system interaction, network access, peripherals, and other items such as location, notifications and clipboard. At runtime generic permissions are refined into concrete runtime permissions after user consent. The work also introduces AgentManifestGen, an automated manifest generator that analyses a server's code base and documentation to propose a brief description and the minimal set of permissions with rationales. A two stage pipeline produces final manifests. The evaluation uses a dataset of the top 296 MCP servers from Pulse MCP, selecting the top by GitHub stars. For developer validation, manifests were submitted to repositories to gauge accuracy and completeness. The policy enforcement leverages Docker based containment, mounts for filesystem scoping, environment controls, and iptables based network whitelisting. The process supports importing static runtime permissions and allows dynamic permissions to be granted per execution with consent.

Key Findings

Policy completeness and accuracy: automated manifests aligned with real world usage, achieving 80.9 per cent accuracy across 96 reviewed manifests, with 100 per cent recall in capturing needed permissions and 0.94 precision and 0.96 recall when compared with manually written manifests across 48 servers
Security effectiveness and threat coverage: AgentBox blocks the majority of malicious environment oriented attacks such as external resource attacks and data exfiltration in several tested MCP servers; one attack type puppet attacks cannot be prevented because it targets the tool handling within the LLM
Runtime efficiency: the policy enforcement adds negligible overhead, measured as about 0.6 ms on macOS and 0.29 ms on Debian during representative operations; startup overhead for sandboxed servers ranged from roughly 150 ms to 400 ms depending on the host

Limitations

Limitations include potential inaccuracies in automatic manifest generation and human evaluation. The external validity is limited to the 296 MCP servers in the study, and the top by GitHub stars may not reflect all MCP server diversity. Some attack classes such as puppet attacks that manipulate tool handling cannot be prevented by AgentBound. Developer responses to automated manifests were not universal, leaving some manifests unvalidated.

Why It Matters

AgentBound provides a practical and portable security boundary for MCP based AI agents, enabling least privilege and isolation without modifying MCP servers. It supports rapid generation of manifests and adds strong system level security with only negligible runtime impact, enabling organisations to harden MCP based workflows while preserving developer productivity. The work also points to avenues for combining manifest driven access control with complementary defence techniques and DevOps workflows to advance MCP security and safe automation.

Attribution Original paper on arXiv