Secure MCP Or Manage New AI Attack Surfaces

Defenses

Published: Wed, Nov 26, 2025 • By James Armitage

Secure MCP Or Manage New AI Attack Surfaces

The Model Context Protocol (MCP) swaps static API ties for dynamic agent workflows, improving automation but expanding the attack surface. New research outlines three attacker types—content injection, supply‑chain compromise, and agents that overstep—and proposes layered controls: scoped authorisation, provenance, sandboxes, inline DLP and a gateway for central governance.

The Model Context Protocol (MCP) is not a niche academic idea any more. It moves integrations away from hard-coded APIs and toward user-driven agents that assemble data, tools and services at runtime. That flexibility buys productivity, but it also reshapes the threat model in ways existing guidance does not fully cover. A recent study, ‘Securing the Model Context Protocol (MCP): Risks, Controls, and Governance’, sets out the risks and a practical defence-in-depth approach.

New threats, familiar trade-offs

The paper identifies three adversary types to watch. First, content-injection attackers hide malicious instructions inside otherwise legitimate inputs so an agent follows harmful steps. Second, supply-chain attackers distribute compromised MCP servers or registries. Third, agents become inadvertent adversaries when they execute beyond their intended scope, using tools or data in ways operators did not expect. These lead to three clear attack vectors: data-driven exfiltration, tool poisoning and cross-system privilege escalation.

The proposed response is deliberately pragmatic. Controls map to five areas: per-user authentication with scoped authorisation; provenance tracking across agent workflows; containerised sandboxing with input and output filtering; inline policy enforcement, including data loss prevention (DLP) and anomaly detection; and centralised governance using private registries or a gateway layer. A central MCP gateway acts as an enforcement point that interposes between agents and external MCP servers, giving teams the observability and policy control they lack in distributed setups.

This architecture is not a panacea. A gateway can add latency, raise operational complexity and become a single point of failure if not designed redundantly. Integrating provenance and cryptographic attestations helps auditing, but it demands new operational practices and tooling. The authors acknowledge that current standards such as the NIST AI Risk Management Framework (AI RMF) and ISO/IEC 42001 provide useful principles but leave gaps for these dynamic systems; mapping the controls to ISO/IEC 27001 is presented as a practical path for organisations that must reconcile MCP security with existing compliance programmes.

What to do next

If you run or advise on MCP-style deployments, start with threat modelling and a few small, defensive investments. Treat registries and MCP servers like software supply chain components and apply the same scrutiny you give containers or packages. Implement per-user, scoped authorisation so an agent only sees the tools and data it needs. Add lightweight provenance so you can trace which agent performed which action. Run untrusted code inside containerised sandboxes with strict input and output checks. Use inline DLP and anomaly detection to flag unusual data movements early. If you adopt a gateway, plan for redundancy and integrate it with your SIEM or SOAR for end-to-end visibility.

There are open research questions: how to build verifiable registries, how to apply formal methods to adaptive agent workflows, and how to preserve privacy while retaining auditability. For now, treat MCP as a classic engineering trade-off: measured gains in capability, matched with layered controls and disciplined governance. The paper offers a clear starting point for teams who want to move beyond fear or hype and actually secure the next generation of AI integrations (see NIST AI RMF and ISO/IEC 42001 for context). Practical security here means avoiding heroic single fixes and building predictable, auditable systems instead.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Securing the Model Context Protocol (MCP): Risks, Controls, and Governance

Authors: Herman Errico, Jiquan Ngiam, and Shanita Sojan

The Model Context Protocol (MCP) replaces static, developer-controlled API integrations with more dynamic, user-driven agent systems, which also introduces new security risks. As MCP adoption grows across community servers and major platforms, organizations encounter threats that existing AI governance frameworks (such as NIST AI RMF and ISO/IEC 42001) do not yet cover in detail. We focus on three types of adversaries that take advantage of MCP s flexibility: content-injection attackers that embed malicious instructions into otherwise legitimate data; supply-chain attackers who distribute compromised servers; and agents who become unintentional adversaries by over-stepping their role. Based on early incidents and proof-of-concept attacks, we describe how MCP can increase the attack surface through data-driven exfiltration, tool poisoning, and cross-system privilege escalation. In response, we propose a set of practical controls, including per-user authentication with scoped authorization, provenance tracking across agent workflows, containerized sandboxing with input/output checks, inline policy enforcement with DLP and anomaly detection, and centralized governance using private registries or gateway layers. The aim is to help organizations ensure that unvetted code does not run outside a sandbox, tools are not used beyond their intended scope, data exfiltration attempts are detectable, and actions can be audited end-to-end. We close by outlining open research questions around verifiable registries, formal methods for these dynamic systems, and privacy-preserving agent operations.

🔍 ShortSpan Analysis of the Paper

Problem

The Model Context Protocol MCP replaces static, developer controlled API integrations with dynamic, user driven agent systems. This shift offers productivity gains but expands the attack surface and highlights security gaps not fully addressed by existing governance frameworks such as the NIST AI Risk Management Framework and ISO IEC 42001. The work formalises three adversary types who exploit MCP s flexibility: content injection attackers who embed malicious instructions into legitimate data; supply chain attackers who distribute compromised MCP servers; and agents that become inadvertent adversaries by over stepping their role. Drawing on early incidents and proof of concept attacks, the authors describe how MCP can increase risk through data driven exfiltration, tool poisoning and cross system privilege escalation, and they propose practical controls and governance ideas to help organisations ensure unvetted code runs inside a sandbox, tools remain within their intended scope, data exfiltration attempts are detectable, and actions are auditable end to end.

Approach

The paper develops a defence in depth framework consisting of five control categories and a gateway architecture to enforce them. The controls cover authentication and authorisation with per user scoped access, provenance tracking across agent workflows, containerised sandboxing with input output checks, inline policy enforcement with data loss prevention and anomaly detection, and centralised governance using private MCP registries or gateway layers. A central MCP gateway provides a unified enforcement point that interposes between agents and MCP servers, enabling end to end observability, policy enforcement, and risk mitigation while preserving user experience. The authors map these controls to established governance frameworks and describe an implementation pathway that integrates MCP security with existing compliance programmes, illustrating how security objectives align with NIST AI RMF, ISO IEC 27001, and ISO IEC 42001. They also discuss threat modelling, attack surface analysis, and concrete attack scenarios to motivate the controls, and outline open research questions in verifiable registries, formal verification for dynamic systems, and privacy preserving agent operations.

Key Findings

The MCP security challenge arises from three adversary types: content injection, supply chain compromise of MCP servers, and inadvertent adversaries that overstep their intended role, enabled by data driven, cross system tool use and external communications.

Limitations

The proposed gateway based defence introduces architectural changes that may add latency and operational complexity, and may create a single point of failure if not redundantly deployed. Implementing centralised governance requires careful alignment with existing security policies, resource planning, and ongoing maintenance. The mapping to standards is guidance driven and may require adaptation for specific organisational contexts. The open research questions identified indicate that complete guarantees for dynamic agent systems remain an active area of study.

Why It Matters

Understanding and mitigating the security risks of dynamic AI agent ecosystems is essential as MCP style architectures become more widespread across community servers and major platforms. The framework offers practical controls that support threat modelling, risk assessment, and incident response by reducing the likelihood of unvetted code executing outside a sandbox, limiting tool access to approved contexts, detecting data exfiltration, and enabling end to end auditing. The emphasis on provenance, policy enforcement, and centralised governance addresses governance gaps in current AI risk frameworks and points to a pathway for implementing secure MCP deployments at scale while addressing data privacy, accountability and data protection concerns. The work also highlights societal security implications around data privacy, surveillance potential, and the need for auditable, verifiable, and privacy preserving agent operations as these dynamic systems mature.

Attribution Original paper on arXiv