Zombie Agents Hijack LLM Memory Across Sessions

Agents

Published: Thu, Feb 19, 2026 • By Dr. Marcus Halden

New research shows self‑evolving Large Language Model (LLM) agents can be persistently hijacked via poisoned long‑term memory. A black‑box attacker seeds web content during a benign task; the agent stores it and later treats it as instruction. Tests on sliding‑window and RAG memory show strong persistence, modest guardrail impact, and realistic harms.

Many Large Language Model (LLM) agents now keep long‑term memory to handle multi‑session work. It helps with continuity and long‑horizon tasks. It also opens a door. This paper shows how an attacker can leave behind a small instruction that the agent dutifully writes to its memory, then later treats as policy. The authors call the resulting compromise a Zombie Agent, which is a pleasingly blunt name for a gnarly problem.

What the study tests

The threat model is strict: the attacker cannot touch model weights, tools, or memory directly. They can only publish web content the agent might read while doing a normal task. The attack has two phases. Infection happens when the agent browses a poisoned page and, through its usual update process, stores the payload as long‑term memory. Trigger happens in a later, unrelated session when that memory is retrieved and nudges the agent into unauthorised tool use or data exfiltration.

The work targets two common memory designs. For sliding‑window memory, where older entries fall off a finite context, the payload uses recursive renewal so the agent keeps rewriting the instruction forward to avoid eviction. For retrieval‑augmented generation (RAG) memory backed by a vector database, the payload spreads using semantic aliasing and embedding pollution so that many queries pull it back in despite relevance filtering.

Evaluation uses two commercial LLMs tuned for tool use and realistic web‑style tasks. The authors compare their approach with standard indirect prompt injections. They measure persistence over time and the ability to induce unauthorised actions while retaining benign task quality. This is where the work shines: it ties mechanism‑specific strategies to measurable outcomes, not anecdotes.

Findings and limits

The Zombie Agent methods beat baseline injections. On sliding‑window agents, recursive renewal kept the malicious content present throughout the experiment, with a 100% retention rate while baselines quickly disappeared once the window filled. On RAG agents, the approach stored roughly 2.5 times more payload copies than baselines and saturated top‑K retrievals, for example returning about 23 malicious entries at K=50. That uplift translated into higher attack success.

System tweaks helped but did not close the hole. Raw‑history memory updates delivered the highest attack success at around 77%. Mitigation‑like update modes pushed success down to the low double digits in some settings, but attacks still landed. Prompt‑based guardrails, including sandwich and instructional styles, shaved roughly 10–15 percentage points off success rates, yet left them well above 60% in tested configurations. Case studies in healthcare and e‑commerce make the risks concrete: covert leakage of patient identifiers and automated fraudulent purchases or credential theft driven by persisted payloads.

The study is careful about scope. It assumes a fixed toolset and a black‑box adversary. Results come from two models and specific memory and retrieval configurations, averaged over three runs. Defences explored are mostly prompt‑level. That is reasonable for isolating the memory effect, but it leaves open how robust the findings are under stricter system controls or different operational policies.

Why it matters is clear: once untrusted content becomes trusted memory, per‑session filtering is not enough. Memory itself becomes part of the trusted computing base. In practice, that points to provenance and signing for memory writes, sanitisation and tamper‑evident logging of memory updates, decay or forgetting policies to limit long‑lived implants, and cross‑session anomaly and policy checks on tool‑invoking behaviour. The next wave of agent security will need to watch not just what goes into the prompt today, but what quietly lingers and reappears tomorrow.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

Authors: Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong

Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance on long-horizon tasks but creates a security risk: untrusted external content observed during a benign session can be stored as memory and later treated as instruction. We study this risk and formalize a persistent attack we call a Zombie Agent, where an attacker covertly implants a payload that survives across sessions, effectively turning the agent into a puppet of the attacker. We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content. The attack has two phases. During infection, the agent reads a poisoned source while completing a benign task and writes the payload into long-term memory through its normal update process. During trigger, the payload is retrieved or carried forward and causes unauthorized tool behavior. We design mechanism-specific persistence strategies for common memory implementations, including sliding-window and retrieval-augmented memory, to resist truncation and relevance filtering. We evaluate the attack on representative agent setups and tasks, measuring both persistence over time and the ability to induce unauthorized actions while preserving benign task quality. Our results show that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies a new persistent threat to self-evolving LLM agents that store and reuse long-term memory across sessions. Untrusted external content seen during a benign session can be written into that memory and later treated as instruction, enabling a persistent compromise the authors call a Zombie Agent: an agent that appears normal but retains a covert payload that can be triggered in later sessions to perform unauthorised tool actions or exfiltrate data.

Approach

The authors formalise a black-box, two-phase attack comprising Infection and Trigger phases. During Infection, an agent browsing attacker-controlled web content ingests a poisoned observation which the agent's normal memory-evolution function commits to long-term storage. During Trigger, a later, unrelated session retrieves the poisoned entry and the payload causes unauthorised actions. The work targets two common memory designs: sliding-window (finite FIFO context) and retrieval-augmented generation (RAG) with a vector database. Mechanism-specific persistence strategies are proposed: recursive self-replication to avoid FIFO eviction, and semantic aliasing plus embedding pollution to maximise retrieval across semantically unrelated queries. The threat model assumes a strict black-box attacker who can only publish external content and cannot modify model weights, memory directly, or toolsets. The attack is evaluated on two commercial LLMs optimised for tool use, using a baited Exposure Phase and a Trigger Phase on realistic web-style queries; metrics include Attack Success Rate, context retention for sliding windows, and injection/recall metrics for RAG. Several baseline indirect prompt-injection strategies are compared.

Key Findings

The Zombie Agent framework significantly outperforms standard indirect prompt-injection baselines on both sliding-window and RAG agents, achieving sustained attack success across many trigger rounds.
For sliding-window agents, recursive renewal preserved the malicious payload through truncation, yielding a 100% retention rate across the experiment where baseline injections rapidly vanished once the window filled.
For RAG agents the method produced aggressive embedding pollution, storing roughly 2.5 times more payload copies than baselines (example counts reported) and saturating top-K retrievals (e.g., retrieving about 23 malicious entries at K=50), which substantially increased recall and attack success.
Evolving memory update strategies reduce but do not eliminate risk: raw-history updates gave the highest attack success (~77%), while mitigation-like update modes reduced success to the low double digits in some settings, yet non-trivial attack execution remained possible.
Prompt-based guardrails (Sandwich, Instructional, Spotlight) reduced success only modestly (a drop of roughly 10–15 percentage points), leaving attack success well above 60% in tested configurations.
Qualitative case studies in healthcare and e‑commerce show realistic harms: covert exfiltration of patient identifiers and automated fraudulent purchases or credential theft driven by persisted payloads.

Limitations

The threat model assumes a black-box attacker limited to publishing external content and a fixed toolset. Experiments were performed on two specific commercial models and on particular memory and retrieval configurations with results averaged over three runs; generalisability to other models, tool permissions or adaptive defenders was not exhaustively evaluated. Tested defences were primarily prompt-based and may not represent all possible system-level controls.

Why It Matters

Memory evolution expands the attack surface: once malicious content is consolidated into trusted memory it can bypass per-session input filters and act as an insider threat. Practical security implications include the need to treat memory as part of the trusted computing base, attach provenance and signatures to updates, sanitise and log memory writes, apply decay or forgetting policies, and add cross-session anomaly and policy checks on tool-invoking behaviour. Without such memory-level controls, self-evolving agents remain vulnerable to long-lived, covert compromise.

Attribution Original paper on arXiv