ShortSpan.ai logo

Survey maps security holes in OpenClaw LLM agents

Agents
Published: Tue, May 26, 2026 • By Clara Nyx
Survey maps security holes in OpenClaw LLM agents
A new survey dissects security risks in OpenClaw-style autonomous agents that couple Large Language Models with persistent memory and high-privilege tools. It shows how prompt injection, memory poisoning, malicious skills and misconfigured gateways turn small mistakes into host-level compromise and cascading outages. The defences exist, but they’re partial and untested at scale.

Autonomous agents keep getting sold as the future of operations. This survey on OpenClaw-style systems is a sober reminder that wiring a Large Language Model (LLM) to long-lived memory and high-privilege tools mostly gives attackers more levers. The paper organises the mess into cognition, execution and interaction layers. That taxonomy isn’t new, but the examples show how routine quirks become system compromise when you add persistence and autonomy.

Start with cognition. Classic prompt injection still works, only now it sticks. Hide instructions in a web page or a spreadsheet, the agent reads it, and you have goal hijacking. With persistent memory, you can seed a “helpful hint” that reactivates months later when retrieved, nudging plans off course. The planning loop couples tightly to the executor, so once the agent internalises a bad instruction, it can repeatedly reissue it without fresh user input.

Execution is where the blast radius widens. Skills behave like unvetted plugins. Publish a malicious or impersonated skill and you can siphon secrets, install payloads or exfiltrate data via a sequence of apparently legitimate tool calls. Unpinned dependencies and remote script fetching are predictable weak spots: resolve a version to attacker-controlled code and the agent politely runs it. Some designs forward model output to a shell or interpreter; craft output that looks like a command and you tip straight into local execution.

Interaction is a permission problem with extra steps. Gateways and credentials get over-scoped or misconfigured, tokens leak, and authorisations can be replayed. Inter-agent chat spreads the harm: one compromised agent convinces another to accept a poisoned instruction or to fetch a tainted skill. Humans in the loop aren’t a safety net when consent fatigue sets in.

Cascades are the punchline. A tiny memory corruption, a planner nudge or a single bad skill can loop, persist across sessions and fan out through multi-agent workflows, draining resources or causing wider outages.

The defence section reads like a greatest hits album: treat external content as untrusted, verify and sanitise memory writes and retrievals, monitor behaviour drift, check provenance and pin dependencies, sandbox tools, keep tokens least-privilege, and ask users twice for dangerous actions. Sensible, but the paper admits most mitigations are partial and evaluations rely on hand-built scenarios. No standard benchmarks, no shared state model, and not much empirical grounding.

The novelty here isn’t that agents are insecure. It’s that persistent memory and high-privilege autonomy make small, idiotic failures durable and contagious. Until we get formal lifecycle models, permission hygiene that sticks, and repeatable tests, an “agent” is just an overpowered automation script with a short attention span and long-term consequences.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

Authors: Yuntao Wang, Jianle Ba, Han Liu, Yanghe Pan, Jintao Wei, Zhou Su, Tom H. Luan, and Linkang Du
The rapid evolution of large language model (LLM)-driven autonomous agents has given rise to OpenClaw, a new class of open-source agent frameworks that operate as continuously running, skill-augmented systems with persistent memory, multi-channel interaction, and high degrees of autonomy. Such capabilities enable OpenClaw agents to autonomously execute complex, multi-step tasks and interact seamlessly with external applications, but simultaneously introduce a substantially enlarged attack surface. In particular, the combination of high-privilege operations and persistent memory exposes OpenClaw agents to various emerging threats, including skill poisoning, cognitive manipulation, multi-agent cascading failures, and supply-chain vulnerabilities. In this survey, we present a comprehensive study of the security landscape of OpenClaw agents. We first examine the general architecture and key characteristics that distinguish OpenClaw agents from traditional AI agent systems. We categorize existing security and privacy threats into a layered framework and analyze how vulnerabilities arise during agent reasoning, action execution, and external interaction. Representative defense mechanisms are also reviewed to draw the current defense landscape. Finally, several unresolved issues related to the reliability and trustworthiness of OpenClaw ecosystems are discussed.

🔍 ShortSpan Analysis of the Paper

Problem

This survey examines security risks introduced by OpenClaw, an open-source class of continuously running, LLM-driven autonomous agents that combine persistent memory, multi-channel interaction and direct access to system resources. These capabilities enable complex, long-horizon automation but enlarge the attack surface: high-privilege operations plus stored context open avenues for skill poisoning, prompt-based goal hijacking, memory corruption, supply-chain compromise, inter-agent propagation and unexpected local code execution. Understanding these threats is important because failures can translate LLM-level mistakes into host-level compromise, data exfiltration or cascading multi-agent outages.

Approach

The paper surveys OpenClaw architecture and operational workflow, then organises threats into a three-layer taxonomy: cognition (reasoning and memory), execution (tool/skill invocation and code execution), and interaction (communications, identity and user approval). It reviews representative attack modes discovered in practice and in prior studies, and summarises mitigation strategies. The survey highlights how persistent memory, skill marketplaces and planner-executor coupling create novel vectors absent from conventional single-shot LLM applications.

Key Findings

  • Cognition-layer attacks can hijack agent goals or contaminate memory: prompt injection, structured instruction injection and persistent memory poisoning can induce long-term behavioural drift or latent backdoors that reappear on retrieval.
  • Execution-layer risks arise when benign tools or skills are misused or compromised: sequential tool chains, malicious or impersonated skills, unpinned dependencies and remote script fetching can lead to data exfiltration, arbitrary command execution or persistent malware deployment.
  • Interaction-layer weaknesses enable privilege escalation and propagation: gateway misconfiguration, credential leakage, excessive permissions, insecure inter-agent communication and human trust exploitation can allow attackers to reuse tokens, replay authorisations and cause other agents or users to accept harmful instructions.
  • Cascading failures amplify damage: small corruptions in memory, planning or skills can persist across sessions and fan out via planner-executor coupling or loop amplification, causing large-scale behavioural, availability or resource-exhaustion incidents.
  • Defence landscape is multi-faceted but incomplete: recommended measures include treating external content as untrusted, sanitising and verifying memory writes and retrievals, runtime drift monitoring, provenance checks and dependency pinning, sandboxed execution, behaviour-sequence detection, least-privilege bindings for tokens, and secondary user verification for sensitive actions.

Limitations

The survey consolidates existing work but notes gaps and limitations: current evaluations often rely on manually constructed scenarios, making comparison across defences difficult; many mitigations remain conceptual or partial; and the review focuses on OpenClaw-style designs so findings may not generalise to all agent frameworks. The paper emphasises the need for standardised benchmarks, formal models of agent state and lifecycle, and broader empirical studies.

Implications

Offensive implications are significant. Attackers can embed malicious instructions into external content to hijack goals, poison long-term memory to create persistent backdoors, publish or manipulate skills to install payloads or steal secrets, craft sequences of benign tool calls to exfiltrate data, inject outputs that are forwarded to shells to trigger local command execution, exploit gateway or credential misconfigurations to escalate privileges, and provoke resource-draining loops or multi-agent cascades. Human trust and consent fatigue can be leveraged to obtain persistent permissions. These vectors enable both short-term compromise and long-lived persistence across agent ecosystems.


Related Articles

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.