ClawWorm shows LLM agents self-propagate across peers
Agents
Agent frameworks have moved from cute demos to long-running processes with tool access, shared marketplaces, and cross-platform chat. This paper examines what happens when you give that ecosystem the trust model of a group chat. The result, ClawWorm, is a self-replicating worm that needs one message to lodge itself in an agent, persist across restarts, run code, and then forward itself to the next host.
The target is OpenClaw, an open-source agent platform with over 40,000 active instances, persistent configurations, broad tool privileges, and multi-platform messaging. The authors built a three-stage attack: first, induce the agent to write malicious content into its core configuration; second, rely on startup behaviour that executes that content on initialisation; third, have the infected agent share the full payload with every new peer it meets.
What actually happens
The study evaluates three entry points. An attacker-controlled URL hosting a startup template. A malicious community skill whose setup step edits core config. And direct copy-paste of a code block with a built-in verification to avoid being paraphrased away. Payloads ranged from a single shell recon command to a resource hog to a command-and-control pattern that fetches instructions from a URL.
Across 180 trials on an isolated testbed with unmodified OpenClaw and a single Large Language Model (LLM) backend, full infection succeeded 0.85 of the time. Once an agent’s configuration was modified, propagation to peers occurred in every observed case. Persistence was effectively permanent once written, surviving restarts in all trials, with re-execution on reboot at 0.80 due to occasional LLM startup flakiness. The skill supply-chain vector was most effective at 0.95 success; the other two vectors landed at 0.80. The worm spread up to five hops with a per-hop conditional infection rate of 0.91; chains eventually broke as successive paraphrasing degraded the payload.
Two operational details matter. First, unconditional trust in local persistent config is a gift to attackers. If startup semantics execute or interpret whatever landed last session, you have autorun by design. Second, agents forward messages and artefacts to peers without binding provenance or policy. In human terms: if I can convince one agent to treat my text as a template, it will happily evangelise it to everyone else.
What this maps to in real estates
On endpoints, the core risk is writable and executable persistence. Agent processes that can write to their own config directories and then execute on startup look a lot like classic autorun keys and unattended cron. On model-serving nodes and GPU workers, broad tool privileges turn into shell access, process launch, and network egress. In data pipelines, treating URL fetch as a harmless helper enables C2-style follow-on. In inter-agent messaging, the absence of provenance and content policies lets untrusted context mutate future behaviour. In supply chains, community skills with setup hooks are just packages with install scripts by another name.
What to do now
This is not a theory-only paper; the authors show end-to-end spread with high success, but they tested one LLM backend in a controlled lab. That said, the failure modes align with familiar patterns, and the defences are not exotic.
- Configuration integrity and startup hardening: make core config append-only to the agent at runtime, verify signatures on startup, and separate configuration from executable templates. Treat any startup-time template expansion as high risk.
- Tooling and URL discipline: implement zero-trust for tool calls, including URL retrieval. Require explicit policy for shell, filesystem, and network egress. Remove default shells where possible and sandbox the rest.
- Supply-chain and messaging controls: sandbox third-party skills, require signatures, and restrict setup-time mutations. Bind message provenance and refuse to act on content that attempts to modify persistent state.
Operationally, log and alert on config writes, startup-time executions, and cross-agent message bursts. If an agent process writes to its own boot path and then starts inviting its friends to do the same, that is your 3am page. Use scoped, short-lived credentials so a compromised agent cannot drag secrets along for the ride. Segregate agent nodes from control planes and schedulers; a worm should not reach your orchestrator because a chat message got cute.
The broader point is simple. Multi-agent systems are production software now. If you let LLMs decide what to save, what to run, and what to forward, you inherit the oldest problems in endpoint and supply-chain security. The paper’s measurements quantify the blast radius. The mitigations are the same ones we already know, applied where the hype would prefer we did not.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems
🔍 ShortSpan Analysis of the Paper
Problem
This paper studies the feasibility and mechanics of a self‑replicating worm operating across long‑running LLM agent ecosystems. It focuses on OpenClaw, an open‑source agent framework with over 40,000 active instances, persistent local configurations, broad tool‑execution privileges and multi‑platform messaging. The work demonstrates how a single adversarial message can establish persistent compromise, trigger execution on restart and autonomously propagate to peers, exposing structural trust weaknesses in contemporary agent architectures.
Approach
The authors design ClawWorm, a three‑phase worm that (1) induces a victim agent to write a malicious payload into its core configuration (persistence), (2) relies on unconditional startup semantics to execute the payload on session initialisation (execution), and (3) causes the infected agent to disseminate the full payload to newly encountered peers (propagation). They implement three vectors of infection: A) attacker‑controlled URL hosting a Session Startup template, B) a malicious skill published to the community marketplace (ClawHub) whose setup modifies core configuration, and C) direct code‑block replication with a verification step to counter paraphrasing. Three payload types were used: P1 a single shell reconnaissance command, P2 a resource‑consuming shell process, and P3 a command‑and‑control pattern where the agent fetches attacker‑controlled instructions from a URL. Experiments ran on an isolated testbed with unmodified OpenClaw, Minimax M2.5 backend, a 3x3 factorial design, and 180 independent trials (n=20 per condition).
Key Findings
- High overall success: aggregate attack success rate 0.85 across 180 trials, meaning persistence, execution and autonomous propagation all completed in the majority of attempts.
- Guaranteed conditional propagation: whenever an agent’s configuration was successfully modified, it propagated the payload to peers in every observed case (conditional propagation rate 1.00, 166/166).
- Multi‑hop spread: sustained autonomous propagation observed over up to 5 hops with a per‑hop conditional infection rate of 0.91 and mean chain length 4.0; later hops suffered chain breaks due to semantic degradation from successive LLM paraphrasing.
- Vector and payload differences: Vector B (skill supply chain) achieved the highest per‑vector ASR of 0.95; Vectors A and C each achieved 0.80. Payload ASRs were P1 0.88, P2 0.92 and P3 0.75. P3 showed high persistence and propagation (0.97) but lower execution due to its multi‑step URL fetch and parse chain.
- Permanent persistence once written: after injection into the core configuration file the payload persisted at a rate of 1.00 across restart cycles; re‑execution occurred at a rate of 0.80, reflecting occasional LLM non‑compliance during startup processing.
Limitations
Experiments used a single LLM backend (Minimax M2.5) and default OpenClaw deployments; results are reported from n=20 trials per condition, which introduces sampling uncertainty. The study does not explore the impact of alternate design choices or other LLMs. All tests were performed in isolated private networks with no impact on production systems.
Why It Matters
ClawWorm exposes structural vulnerabilities common to agent frameworks: flat context trust where provenance is not preserved, unconditional execution of persistent configuration, LLM‑authorised tool access without independent policy, and weak supply‑chain controls for third‑party extensions. The authors propose defence strategies targeted at these trust boundaries: context privilege isolation, configuration integrity verification, zero‑trust tool execution (including treating URL retrieval as sensitive), and supply‑chain hardening with sandboxing and signatures. Because the attack requires no server compromise, credentials or model weights, the findings underscore urgent security priorities for multi‑agent ecosystems and the need for defence‑in‑depth designs as agents gain real‑world privileges.