ShortSpan.ai logo

OpenClaw agents widen attack surface with tools and memory

Agents
Published: Mon, May 25, 2026 • By Clara Nyx
OpenClaw agents widen attack surface with tools and memory
A new analysis of OpenClaw, a locally run AI agent, argues the platform is primed for abuse: easy prompt injection, overpowered tools, a risky plugin supply chain, and sticky memory that spreads bad context across sessions and users. Audits cited report weak defences and sub‑60% overall pass rates.

OpenClaw is a locally run AI agent that moves from answering questions to taking actions. It plans, keeps memory, and drives tools across browsers, files, messaging and plugins. That shift from text to real state change is where things break. This paper maps the breakage with a focus on how attackers actually get in.

Prompt injection with teeth

The core problem is indirect prompt injection. The agent ingests untrusted content, is wired to sensitive systems, and can change the world outside its chat window. That violates the basic Rule of Two and makes planted instructions highly practical. A poisoned web page or document tells the agent to “summarise this” and then quietly instructs it to read local files or send messages. Audits cited in the paper report low robustness to these adversarial inputs. Once the model believes the context, the rest is just tooling.

Tools, plugins, and sticky memory

OpenClaw’s tool layer can read and write files, steer a browser, and message across channels. Privilege separation is coarse, so misunderstanding intent is costly. One audit reported a 0% pass rate on intent misunderstanding and frequent cascades of unsafe actions: a single misread goal led to multiple irreversible side effects.

The plugin and skill ecosystem is a supply chain problem. Community analysis found vulnerable tools and outright malicious or problematic skills. Extensions load native code in process without sandboxing, widening the trust boundary and turning a convenience feature into an execution path. If an attacker can slip a “skill” past weak vetting, they inherit the agent’s privileges.

Memory persistence makes it worse. Sessions and gateways keep context around, so injected junk or sensitive data can leak across later tasks and even other users. The paper points to evidence of cross-session and cross-agent propagation. Once poisoned, the agent keeps carrying the attacker’s breadcrumbs forward.

Defences are uneven. Native safeguards scored poorly, human approvals helped but were inconsistent, and many proposed mitigations were easy to bypass. Combined audit pass rates sat under 60%. When an agent can alter external state, “mostly safe” is not safe enough; rollback is not guaranteed and attribution gets fuzzy.

Is any of this surprising? Not really. Give a Large Language Model (LLM) high privileges, untrusted inputs, and persistence, and you get an attractive target. What matters here is the consolidation: concrete failure modes across prompt handling, tooling, supply chain, and memory, tied to real audit results. The open questions are the hard ones: fine-grained privileges without killing utility, plugin vetting that actually bites, and traceability strong enough to assign responsibility when autonomous actions go wrong.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Security, Privacy, and Ethical Risks in OpenClaw

Authors: Yutong Jin, Zelin Zhang, Zhijin Lyu, and Jianbing Ni
This paper systematically investigates the security, privacy, and ethical risks, as well as the traceability challenges of OpenClaw, a locally executable AI agent system for natural language interaction and real-world task completion. While OpenClaw shows strong potential for personal assistance, office automation, cross-platform task management, and information integration, it also raises serious security, privacy, and ethical concerns. By analyzing its system architecture, core functionalities, deployment model, and representative application scenarios, this paper aims to reveal the risks that may arise when such a highly privileged agent is integrated into personal and organizational digital environments. We focus in particular on the challenges associated with persistent local storage, tool invocation, cross-context information aggregation, multi-user interaction, and the integration of plugins and external services. We argue that these issues constitute major barriers to the trustworthy deployment and widespread adoption of this technology. Finally, we summarize the open challenges in security defenses, privacy protection, ethical governance, and traceability in agent use, and call for joint efforts from researchers, developers, deployers, and regulators to build AI agent systems that are safer, more reliable, and more trustworthy.

🔍 ShortSpan Analysis of the Paper

Problem

This paper examines security, privacy and ethical risks, plus traceability challenges, in OpenClaw, a self‑hosted autonomous AI agent platform that can plan, persist memory and invoke external tools to perform real tasks across messaging channels, browsers, files and plugins. The work matters because OpenClaw shifts language models from passive text generation to action‑capable assistants, broadening utility but also creating new vectors for data exposure, irreversible actions and diffuse responsibility when deployed in personal or organisational environments.

Approach

The authors analyse OpenClaw’s documented architecture and representative use cases, organising risks around five components: the gateway, embedded runtime, tool layer, skill mechanism and persistent sessions. They develop a threat model that considers attacker identity, capability and objectives, and integrate empirical evidence from security audits and community analyses of tool and skill ecosystems to characterise real attack scenarios and defence performance.

Key Findings

  • Indirect prompt injection is highly practical: because OpenClaw ingests untrusted content, has access to sensitive systems and can produce external state changes, it violates the recommended Rule of Two and is vulnerable to adversarial content that becomes part of the agent’s context. Reported robustness to prompt injection was low in audits.
  • Tool and browser capabilities amplify harm: the tool layer can read/write files, control browsers and send messages without fine‑grained privilege separation, producing irreversible effects; one audit reported a 0% pass rate for intent misunderstanding tests and frequent cascades of unsafe actions.
  • Skill supply chain and plugin risks are substantial: a large third‑party ecosystem expands the trust boundary; community analysis found many vulnerable tools and malicious or problematic skills in the repository, and extension loading runs native code in process without sandboxing.
  • Persistent sessions create cross‑session contagion and privacy accumulation: persisted memory and shared gateways can let poisoned or sensitive content influence later interactions and other users; literature and audits demonstrate feasible propagation across agents and instances.
  • Defences are currently weak and uneven: native defence success rates reported were low, human approval helps but is variable, and many proposed defences can be bypassed at high rates; overall audit pass rates were under 60% on combined risk dimensions.

Limitations

The analysis relies on OpenClaw documentation, community audits and recent security studies; some referenced empirical works are preprints. The assessment focuses on an exemplar platform and on documented behaviours and audits rather than on private deployments or unpublished mitigations, so actual risk in a specific deployment will depend on configuration, isolation and operational controls.

Implications

Attackers can exploit third‑party content, skills or shared sessions to hijack goals, exfiltrate data or trigger destructive tool actions. Supply‑chain and plugin vectors enable covert privilege escalation, while persistent memory and cross‑channel access allow adversarial influence to persist and spread. These capabilities mean a compromised OpenClaw instance can act as an autonomous attack vector inside personal or organisational networks with broad privacy and operational impact.


Related Articles

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.