ShortSpan.ai logo

Watcher-led defence hardens OpenClaw autonomous agents

Agents
Published: Thu, Mar 26, 2026 • By Elise Veyron
Watcher-led defence hardens OpenClaw autonomous agents
ClawKeeper proposes a three-layer defence for OpenClaw agents: policy-injected skills, hardened plugins, and a decoupled Watcher that intervenes in real time. In tests across seven threat categories, it achieves 85–90% defence success and outperforms fragmented baselines. The design raises practical privacy and portability trade-offs but offers clearer oversight routes.

Autonomous agents are moving from demos to daily tooling, and the security stakes are rising with them. When a Large Language Model (LLM) can call tools, read local files, and run shell commands, small mistakes turn into real incidents. OpenClaw is one such agent runtime, and a new framework called ClawKeeper sets out to reduce the blast radius with a layered defence-in-depth design.

Three layers of control

ClawKeeper combines three complementary protections. First, skill-based protection injects structured security policies into the agent’s instruction context. The idea is simple: tell the agent what it must not do, with environment-specific constraints and cross-platform boundaries, and keep those rules present throughout the interaction. This layer also supports scheduled scans and summaries of interactions to help spot drift.

Second, plugin-based protection hardens the runtime itself. It enforces configuration protection, performs proactive threat detection, and monitors behaviour continuously. It can create cryptographic backups of critical files and run retrospective scans of behaviour, which is useful when you need to understand what happened after the fact.

Third, watcher-based protection introduces a decoupled supervisor that runs as a separate OpenClaw instance. It receives streamed session data over a WebSocket, analyses execution in real time, and can halt or pause risky operations, request human confirmation, and update its own safety knowledge over time. Because it sits outside the agent’s internal logic, it is harder to disable and easier to audit.

Measured gains, known trade-offs

The authors evaluate ClawKeeper on a 140-instance benchmark spanning seven threat categories, including sensitive data leakage, privilege abuse, and malicious third-party skill execution. Across categories, ClawKeeper reports defence success rates of about 85 to 90 percent, improving on open-source baselines by roughly 15 to 45 percentage points per category. No single baseline covered more than three categories, which underscores how fragmented current defences are. Among the three layers, the Watcher provides the strongest safety guarantees due to its architectural separation. It also shows a self-evolving capability, with defence success rising from about 90 to about 95 percent after processing 100 novel adversarial cases and updating its monitoring skills and in-context memory.

The limitations are practical and worth dwelling on. Skill-based measures depend on the quality of hand-written rules and the model’s willingness to follow them; both can be subverted. Plugin-based protections are tightly coupled to OpenClaw, which reduces portability and places maintenance pressure on static rules. The Watcher needs a persistent communication path and, in cloud setups, streams session data off-device, which raises privacy and governance questions. Running a separate supervisor also costs compute, and there is some deployment complexity even with installation tooling. The authors release their code, which should help teams evaluate these trade-offs in their own environments.

For security operations, the broader pattern matters as much as the numbers. ClawKeeper’s tri-layer design maps cleanly onto familiar control families: policy at the instruction layer, hardening and monitoring in the runtime, and an out-of-process enforcer with a kill switch. The separation of duties is doing heavy lifting here. When the oversight component is independent, you get real-time intervention and clearer lines for audit and incident response. That is especially pertinent for supply-chain risk in agent ecosystems, where third-party skills can become untrusted code paths.

From a policy angle, this architecture points toward workable governance norms for agents: keep high-risk actions observable by a separate control plane, ensure configurable human-in-the-loop checkpoints, and make monitoring evidence exportable for external scrutiny. The Watcher model could form the backbone of minimum safeguards that regulators and procurers expect for autonomous agent deployments without mandating a specific vendor implementation.

Open questions remain. How should teams balance the Watcher’s visibility with data minimisation expectations in cloud environments. What standard interfaces would let a Watcher supervise heterogeneous agent runtimes, not just OpenClaw. And how do we test and certify the Watcher’s own behaviour as threats evolve. ClawKeeper does not answer all of this, but it moves the conversation from static prompts to operational controls that can intervene when it counts.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Authors: Songyang Liu, Chaozhuo Li, Chenxu Wang, Jinyu Hou, Zejian Chen, Litian Zhang, Zheng Liu, Qiwei Ye, Yiming Hei, Xi Zhang, and Zhongyuan Wang
OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) \textbf{Skill-based protection} operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) \textbf{Plugin-based protection} serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) \textbf{Watcher-based protection} introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.

🔍 ShortSpan Analysis of the Paper

Problem

The paper examines security risks in OpenClaw, an open-source autonomous agent runtime that can invoke tools, run shell commands and access local files. Those privileges turn model mistakes and malicious inputs into system-level threats such as prompt injection, sensitive data leakage, privilege escalation, dangerous command execution and malicious third‑party skill installation. Existing mitigations are fragmented, typically address only isolated stages of the agent lifecycle, and suffer from trade‑offs between safety and utility, reactive detection, and static rule sets that cannot co‑evolve with agents.

Approach

ClawKeeper is a defence‑in‑depth framework that integrates three complementary protection layers: (1) skill‑based protection injects structured security policies (defined as Markdown plus lightweight scripts) into the agent’s instruction context and supports scheduled scans and interaction summarisation; (2) plugin‑based protection is an internal runtime enforcer providing threat detection, hardening, configuration protection, continuous monitoring, cryptographic backups of critical files and retrospective behavioural scanning; (3) watcher‑based protection is a decoupled supervisory agent implemented as a separate OpenClaw instance that receives streamed session data over a WebSocket, diagnoses execution trajectories in real time and can pause or halt unsafe actions, request human confirmation and update its safety knowledge over time. The authors evaluate ClawKeeper on a 140‑instance benchmark covering seven threat categories and compare it to several open‑source baselines. The implementation is released publicly.

Key Findings

  • ClawKeeper achieves substantially higher defence success rates (85–90%) across seven threat categories than existing open‑source baselines, improving per‑category performance by roughly 15 to 45 percentage points.
  • No single prior baseline covered more than three of the seven categories, showing severe coverage fragmentation; ClawKeeper’s unified three‑layer design yields broader and more robust coverage.
  • The Watcher provides the highest safety guarantees among the three paradigms because of architectural separation: it resists adversarial uninstallation, enables real‑time intervention and supports external auditability.
  • The Watcher also exhibits a measurable self‑evolving capability: its defence success rate improves from about 90% to about 95% after processing 100 novel adversarial cases by updating monitoring skills and in‑context memory.

Limitations

Skill‑based measures depend on the quality of manual rules and the model’s compliance and are vulnerable to adversarial manipulation. Plugin‑based protection is tightly coupled to OpenClaw, reducing portability and relying on largely static rules that need continual maintenance. The Watcher requires a persistent communication interface and, in cloud deployments, transmits session information off‑device, introducing privacy trade‑offs; running a separate Watcher also adds computational cost. Some practical deployment complexity remains despite provided installation tooling.

Why It Matters

ClawKeeper demonstrates that combining instruction‑level policies, in‑runtime hardening and an independent supervisory agent produces broader and stronger security for autonomous agents than isolated defences. The decoupled Watcher paradigm in particular offers a generalisable model for real‑time oversight, continuous learning of new threats and external verifiability, making it a promising building block for safer agent ecosystems where preventing data exfiltration, privilege abuse and malicious skill supply‑chain attacks is critical.


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.