ShortSpan.ai logo

Environment-injected memory poisoning trips LLM web agents

Agents
Published: Mon, Apr 06, 2026 • By Elise Veyron
Environment-injected memory poisoning trips LLM web agents
New research shows LLM web agents can be poisoned just by viewing manipulated content. A single exposure seeds long-term memory that later misguides tasks across sites and sessions, bypassing permission checks. Experiments report up to 32.5% success and an eightfold jump under stress. More capable models are not safer.

Web agents built on Large Language Models (LLMs) now browse, buy, and file forms. They remember what they see so they can move faster next time. That memory is also a durable attack surface.

The study introduces Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP). The move is simple and nasty: show the agent a booby-trapped page once, let it observe a crafted snippet, and it will file that away in long-term memory. Later, on a different site and in a new session, the agent recalls the planted cue and acts on it. No one ever writes to the agent’s storage directly. There is no shared memory to abuse. The contamination flows purely from what the agent sees.

How eTAMP lands the punch

The attacker controls or edits content the agent will naturally encounter, such as a manipulated product page. The agent treats the observation as useful context and stores it. Because recall is framed as helpful background, the poisoned memory bypasses permission gates that guard explicit memory writes. When the agent tackles a new task elsewhere, the earlier memory triggers and steers behaviour. The compromise is cross-session and cross-site by design.

On (Visual)WebArena, the authors report substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Capability did not buy safety. GPT-5.2 performed tasks well yet still showed notable vulnerability.

The sharpest twist is what the authors call Frustration Exploitation. When the environment feels unreliable to the agent, susceptibility spikes. Dropped clicks or garbled text make the model more likely to cling to cached memories. Under these stressors, attack success rose by as much as eight times. That is an ugly feedback loop for any agent that has to cope with flaky web UIs.

Why does this work? It hits the seam where observation becomes memory. Most defences focus on who can write to memory or whether users share it. Here, the adversary never asks permission. They let the agent volunteer the write based on its own perception. Once in, the memory persists and crosses domain boundaries later.

With AI browsers such as OpenClaw, ChatGPT Atlas, and Perplexity Comet on the horizon, the web itself becomes a long-lived control channel for agents. The open questions now look bigger than a single patch: what should count as trustworthy memory provenance, how should agents surface the source of recalled facts, and how do we measure robustness when capability alone does not predict it?


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.