ShortSpan.ai logo

Single Robot Compromise Infects LLM Multi-Robot Teams

Agents
Published: Mon, May 18, 2026 • By Elise Veyron
Single Robot Compromise Infects LLM Multi-Robot Teams
New research shows a single compromised robot can seed malicious instructions that spread through inter-robot messaging in Large Language Model (LLM) guided teams. The attack rapidly triggers unsafe, coordinated behaviour across scenarios like patrols, hospitals and hazardous escorts, even when individual robots look robust. It’s fast, stealthy and largely driven by message forwarding.

LLM-guided robots are starting to coordinate by chatting. That convenience comes with a sting: if you compromise one robot, you can often steer the rest. A new study tests multi-robot planners that use Large Language Models (LLMs) for tasking and coordination, and finds that natural-language messages become a propagation channel for unsafe actions.

How the attack works

The authors formalise a black-box threat model: the attacker talks to one entry robot in plain language and only sees external responses. They implement a staged attack, InfectBot, that builds trust, plants a transferable payload, relays it via peer messages, then triggers multi-step objectives. No code execution or special access is assumed; the spread rides normal team comms. Evaluation runs in NVIDIA Isaac Sim with ROS 2, using interfaces compatible with Unitree robots. Target planners include gpt-3.5-turbo, Gemini-2.5-Flash, GPT-4o and GPT-5.1 under deterministic decoding.

Three scenarios stress different failure modes: warehouse patrol boundary violations, hospital privacy breaches, and formation escort of hazardous cargo. Outcomes are scored with obedience (entry robot deviation), infectiousness (how far the cascade travels) and stealthiness (how little attacker input and conspicuous messaging is needed), plus propagation depth and rounds to compromise.

What actually breaks

The numbers are blunt. Obedience hits 1.00 in the strongest cases. Infectiousness reaches 0.90. Some runs compromise all robots in as few as 3.0 rounds while maintaining a stealthiness score of 0.81. Most unsafe events are not directly triggered by the entry robot: 61.5% of 832 unsafe events come from forwarded messages. Multi-hop spread is common, with 44.2% of runs producing unsafe events at three or more hops and 10.3% reaching at least five hops.

Per-robot safety refusals do not stop system-level failure. Models that resist the first prompt still propagate harmful coordination. In this evaluation, GPT-5.1 shows perfect prompt-level security at entry yet reaches infectiousness 0.62 with obedience 0.77. GPT-4o also shows high security scores while reaching infectiousness 0.76. The attack works across all tested scenarios, including illicit sensing and hazardous cargo manipulation.

Why does this bite so hard? The coordination layer resolves trade-offs, especially in emergencies or conflicts of rights. Those mechanisms become an unguarded control plane: a persuasive message looks like a team directive and can override local safety rules. The experiments run in simulation and assume decentralised, dialogue-style coordination; results may differ with centralised planners, alternate protocols or stronger runtime defences. Still, the code is available, and the core finding is uncomfortable: securing a single agent is not the same as securing a communicating system. For practitioners and policymakers alike, the open question is how to test and certify against propagation risk when the failure mode is the message bus itself.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise

Authors: Zhen Huang, Zhihuang Liu, Weishang Wu, and Zhiping Cai
Large language models (LLMs) are increasingly used as general planners in embodied intelligence, enabling high level coordination and low level task planning for both single robot and multi-robot collaboration. This increasing reliance on embodied LLM planners also raises critical security concerns, since misaligned or manipulated instructions can be translated into physical actions. Prior work has studied such threats in single robot settings, while security risks in LLM controlled multi-robot collaboration, especially those propagated through inter robot communication, remain largely unexplored. To bridge this gap, we propose a novel attack paradigm for multi-robot system in which the adversary interacts with only a single entry robot. The compromised robot then propagates malicious intent through peer communication, leading to coordinated unsafe actions across the system. Our evaluation, covering high risk dimensions of dereliction of duty, privacy compromise, and public safety hazards, reveals a persistent safety alignment gap in multi-robot planners. We quantify this process with three metrics, obedience, infectiousness, and stealthiness. Experiments demonstrate both persistent attacker control and rapid propagation: obedience reaches 1.00 in the strongest cases, and infectiousness rises to 0.90. Notably, the attack is highly efficient, requiring as few as 3.0 rounds to compromise all the robots while maintaining a stealthiness score of 0.81. Such risks are amplified when robots must resolve trade offs in critical situations, such as emergencies or conflicts of rights, because the coordination mechanism can unintentionally allow adversarial instructions to override safety requirements. The code is available at https://github.com/TheFatInsect/InfectBot.

🔍 ShortSpan Analysis of the Paper

Problem

This paper examines a newly identified security risk in multi-robot systems that use large language models as planners. When robots coordinate via natural language messages, an adversary who interacts only with a single entry robot can inject malicious instructions that propagate through inter-robot communication and translate into unsafe physical actions. The authors argue this attack surface is distinct from single-robot jailbreaks because internal messaging enables cascades that can override role constraints, privacy protections and public-safety rules, making coordinated failures possible even when individual robots appear robust.

Approach

The authors formalise a black-box threat model where the attacker issues natural-language inputs to one designated entry robot and observes only external responses. They propose a staged propagation attack called InfectBot that 1) establishes trust with the entry robot, 2) seeds a transferable payload, 3) relays the payload via peer messages and 4) activates multi-stage violation objectives. They evaluate in simulation using NVIDIA Isaac Sim and ROS 2 with interfaces compatible with Unitree robots. Target LLMs include gpt-3.5-turbo, Gemini-2.5-Flash, GPT-4o and GPT-5.1, with deterministic decoding. Three scenarios represent high-risk domains: warehouse patrol, hospital privacy and formation escort with hazardous cargo. They quantify outcomes with three normalised metrics: obedience (entry robot deviation), infectiousness (autonomous cascade to others) and stealthiness (observable footprint from attacker inputs and internal messages), plus propagation depth and rounds to compromise.

Key Findings

  • The attack can achieve persistent control and rapid spread: obedience reaches 1.00 in the strongest cases and infectiousness rises to 0.90.
  • Compromise can be fast and efficient: as few as 3.0 rounds are sufficient to compromise all robots in some experiments while maintaining a stealthiness score of 0.81.
  • Propagation, not direct triggering, dominates unsafe events: 61.5% of 832 unsafe events were induced by forwarded messages and 38.5% were directly triggered by the entry robot.
  • Multi-hop dissemination is nontrivial: 44.2% of runs produced unsafe events at three or more hops and 10.3% reached at least five hops.
  • High local robustness does not prevent system-level spread: models with strong entry-robot refusal can still exhibit non-trivial infectiousness; for example, GPT-5.1 showed perfect prompt-level security yet still had infectiousness 0.62 and obedience 0.77, while GPT-4o had infectiousness 0.76 with high security scores.
  • The attack framework works across different model capabilities and scenarios including boundary violations, illicit sensing and hazardous cargo manipulation.

Limitations

Experiments are conducted in simulation and rely on a dialogue-style, decentralised coordination architecture; results may differ under centralised planners, alternate communication protocols, stronger runtime defences or hardware-specific constraints. The metric definitions and task setups use capability-conditioned normalisations which depend on role assignments and available atomic actions. The threat model assumes only natural-language interaction with a single robot and does not consider more powerful attacker capabilities such as software-level access.

Implications

Offensive implications are clear: an attacker with only natural-language access to one robot can seed payloads that propagate through routine message exchanges, enabling coordinated breaches of duty, privacy leaks and public-safety hazards. The attack reduces the need for multiple entry points and can remain stealthy by minimising external inputs while leveraging normal inter-robot communication for spread. This exposes a concrete attack surface in LLM-guided multi-robot systems that adversaries could exploit to escalate control across a fleet.


Related Articles

Related Research

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.