Environment-injected memory poisoning trips LLM web agents
Agents
Web agents built on Large Language Models (LLMs) now browse, buy, and file forms. They remember what they see so they can move faster next time. That memory is also a durable attack surface.
The study introduces Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP). The move is simple and nasty: show the agent a booby-trapped page once, let it observe a crafted snippet, and it will file that away in long-term memory. Later, on a different site and in a new session, the agent recalls the planted cue and acts on it. No one ever writes to the agent’s storage directly. There is no shared memory to abuse. The contamination flows purely from what the agent sees.
How eTAMP lands the punch
The attacker controls or edits content the agent will naturally encounter, such as a manipulated product page. The agent treats the observation as useful context and stores it. Because recall is framed as helpful background, the poisoned memory bypasses permission gates that guard explicit memory writes. When the agent tackles a new task elsewhere, the earlier memory triggers and steers behaviour. The compromise is cross-session and cross-site by design.
On (Visual)WebArena, the authors report substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Capability did not buy safety. GPT-5.2 performed tasks well yet still showed notable vulnerability.
The sharpest twist is what the authors call Frustration Exploitation. When the environment feels unreliable to the agent, susceptibility spikes. Dropped clicks or garbled text make the model more likely to cling to cached memories. Under these stressors, attack success rose by as much as eight times. That is an ugly feedback loop for any agent that has to cope with flaky web UIs.
Why does this work? It hits the seam where observation becomes memory. Most defences focus on who can write to memory or whether users share it. Here, the adversary never asks permission. They let the agent volunteer the write based on its own perception. Once in, the memory persists and crosses domain boundaries later.
With AI browsers such as OpenClaw, ChatGPT Atlas, and Perplexity Comet on the horizon, the web itself becomes a long-lived control channel for agents. The open questions now look bigger than a single patch: what should count as trustworthy memory provenance, how should agents surface the source of recalled facts, and how do we measure robustness when capability alone does not predict it?