OpenClaw agents widen attack surface with tools and memory
Agents
OpenClaw is a locally run AI agent that moves from answering questions to taking actions. It plans, keeps memory, and drives tools across browsers, files, messaging and plugins. That shift from text to real state change is where things break. This paper maps the breakage with a focus on how attackers actually get in.
Prompt injection with teeth
The core problem is indirect prompt injection. The agent ingests untrusted content, is wired to sensitive systems, and can change the world outside its chat window. That violates the basic Rule of Two and makes planted instructions highly practical. A poisoned web page or document tells the agent to “summarise this” and then quietly instructs it to read local files or send messages. Audits cited in the paper report low robustness to these adversarial inputs. Once the model believes the context, the rest is just tooling.
Tools, plugins, and sticky memory
OpenClaw’s tool layer can read and write files, steer a browser, and message across channels. Privilege separation is coarse, so misunderstanding intent is costly. One audit reported a 0% pass rate on intent misunderstanding and frequent cascades of unsafe actions: a single misread goal led to multiple irreversible side effects.
The plugin and skill ecosystem is a supply chain problem. Community analysis found vulnerable tools and outright malicious or problematic skills. Extensions load native code in process without sandboxing, widening the trust boundary and turning a convenience feature into an execution path. If an attacker can slip a “skill” past weak vetting, they inherit the agent’s privileges.
Memory persistence makes it worse. Sessions and gateways keep context around, so injected junk or sensitive data can leak across later tasks and even other users. The paper points to evidence of cross-session and cross-agent propagation. Once poisoned, the agent keeps carrying the attacker’s breadcrumbs forward.
Defences are uneven. Native safeguards scored poorly, human approvals helped but were inconsistent, and many proposed mitigations were easy to bypass. Combined audit pass rates sat under 60%. When an agent can alter external state, “mostly safe” is not safe enough; rollback is not guaranteed and attribution gets fuzzy.
Is any of this surprising? Not really. Give a Large Language Model (LLM) high privileges, untrusted inputs, and persistence, and you get an attractive target. What matters here is the consolidation: concrete failure modes across prompt handling, tooling, supply chain, and memory, tied to real audit results. The open questions are the hard ones: fine-grained privileges without killing utility, plugin vetting that actually bites, and traceability strong enough to assign responsibility when autonomous actions go wrong.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
Security, Privacy, and Ethical Risks in OpenClaw
🔍 ShortSpan Analysis of the Paper
Problem
This paper examines security, privacy and ethical risks, plus traceability challenges, in OpenClaw, a self‑hosted autonomous AI agent platform that can plan, persist memory and invoke external tools to perform real tasks across messaging channels, browsers, files and plugins. The work matters because OpenClaw shifts language models from passive text generation to action‑capable assistants, broadening utility but also creating new vectors for data exposure, irreversible actions and diffuse responsibility when deployed in personal or organisational environments.
Approach
The authors analyse OpenClaw’s documented architecture and representative use cases, organising risks around five components: the gateway, embedded runtime, tool layer, skill mechanism and persistent sessions. They develop a threat model that considers attacker identity, capability and objectives, and integrate empirical evidence from security audits and community analyses of tool and skill ecosystems to characterise real attack scenarios and defence performance.
Key Findings
- Indirect prompt injection is highly practical: because OpenClaw ingests untrusted content, has access to sensitive systems and can produce external state changes, it violates the recommended Rule of Two and is vulnerable to adversarial content that becomes part of the agent’s context. Reported robustness to prompt injection was low in audits.
- Tool and browser capabilities amplify harm: the tool layer can read/write files, control browsers and send messages without fine‑grained privilege separation, producing irreversible effects; one audit reported a 0% pass rate for intent misunderstanding tests and frequent cascades of unsafe actions.
- Skill supply chain and plugin risks are substantial: a large third‑party ecosystem expands the trust boundary; community analysis found many vulnerable tools and malicious or problematic skills in the repository, and extension loading runs native code in process without sandboxing.
- Persistent sessions create cross‑session contagion and privacy accumulation: persisted memory and shared gateways can let poisoned or sensitive content influence later interactions and other users; literature and audits demonstrate feasible propagation across agents and instances.
- Defences are currently weak and uneven: native defence success rates reported were low, human approval helps but is variable, and many proposed defences can be bypassed at high rates; overall audit pass rates were under 60% on combined risk dimensions.
Limitations
The analysis relies on OpenClaw documentation, community audits and recent security studies; some referenced empirical works are preprints. The assessment focuses on an exemplar platform and on documented behaviours and audits rather than on private deployments or unpublished mitigations, so actual risk in a specific deployment will depend on configuration, isolation and operational controls.
Implications
Attackers can exploit third‑party content, skills or shared sessions to hijack goals, exfiltrate data or trigger destructive tool actions. Supply‑chain and plugin vectors enable covert privilege escalation, while persistent memory and cross‑channel access allow adversarial influence to persist and spread. These capabilities mean a compromised OpenClaw instance can act as an autonomous attack vector inside personal or organisational networks with broad privacy and operational impact.