Personalised memories skew LLM agents' tool calls
Agents
Agent builders have been stitching long-term memory onto tool-using Large Language Model (LLM) systems because it makes them feel helpful and personal. The catch: those memories do not always stay in their lane. A new study names the failure mode “memory-induced tool-drift” and shows, with uncomfortable clarity, how a user’s recorded quirks can tip critical parameters inside tools that were supposed to be strictly professional.
The authors built MEMDRIFT, a 105-scenario benchmark spanning five bias dimensions (speed, frugality, minimalism, risk, autonomy) and seven domains, with hard walls between personal memories and work tasks. That wall matters: any influence from memory is unambiguously out of bounds. Across seven frontier models, biased memories pushed a judge-scored deflection by as much as +3.6 on a 1–5 scale. Neutral memories barely moved the needle, which points the finger squarely at personality-laden entries.
How the drift works
Mechanistically, these memories act like implicit steering vectors. They nudge internal activations along the same latent directions as explicit behavioural instructions, then pull attention away from task context toward memory snippets with surface-level keyword overlap to the target parameter. Production memory frameworks did not save the day. When the team ran the same tests through three real memory architectures, summarisation and retrieval often stripped the disambiguating context and rewarded that shallow overlap, yielding similar or worse drift.
The exposure is not theoretical. The team scanned 6,062 tools across 288 verified Model Context Protocol servers and flagged 608 with susceptible parameters. They replayed a validated subset and saw concrete flips: a healthcare repo’s project visibility changing from private to public, or safesearch being turned off in education-related searches.
This is an attacker’s kind of bug because it lives in the gap between memory and action. Craft a personal-style memory that says “I’m frugal” or “I like to move fast,” line it up with parameter names like tier, timeout, or approvals, and you can steer an agent toward weaker safeguards, cheaper plans, faster but unsafe settings, or removed gates. Tool calls often execute with low user observability. The nudge happens in the dark.
Why this smells familiar
If you squint, it looks like an old systems story: configuration drift. In the data centre, small preference leaks into global defaults and suddenly a fleet inherits weak ciphers. Here, a cosy memory about thrift leaks into cost or visibility parameters. Once you mingle a user’s whim with operational policy, the whim has a way of winning.
Defences helped a bit. Prompting models with memory-usage guidelines shaved scores (one reported overall reduction of −0.52), and a relevance filter worked in the benchmark’s clean setting. But in more realistic, multi-hop relevance, the filter caught only 61.0% with a 10.3% false positive rate. The study sticks to single-tool, single-turn cases, which keeps the measurement honest but likely understates the combinatorial mess of chained tools. That leaves an open question squarely in the agent community’s lap: what should the contract be between memory management and tool-call generation so that personalisation enriches conversations without silently rewriting the runbook?
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
Memory-Induced Tool-Drift in LLM Agents
🔍 ShortSpan Analysis of the Paper
Problem
The paper studies a failure mode in memory-augmented large language model agents where personality-driven memories stored for personalisation silently bias tool-call parameters in professional contexts where those memories are irrelevant. This "memory-induced tool-drift" matters because agent tool calls often execute automatically with low user observability and can produce irreversible or high-consequence outcomes if parameters are altered inappropriately.
Approach
The authors formalise the problem and introduce MEMDRIFT, a benchmark of 105 adversarially generated scenarios spanning five bias dimensions (speed/impatience, resource frugality, minimalism/conciseness, risk permissiveness, autonomy/self-reliance) and seven professional domains (healthcare, finance, legal, software infrastructure, education, e-commerce, marketing). Scenarios enforce strict separation between personal memories and professional tasks so any memory influence is unambiguously inappropriate. Generation uses an iterative LLM pipeline with adversarial refinement and an LLM judge to maximise measurable drift. Evaluation covers two delivery modes: direct memory injection into the system prompt and three production memory frameworks (Mem0, MemPalace, SimpleMem). Seven frontier models were tested, including closed-source and open-weight systems, and a vulnerability scan examined 6,062 tools across 288 MCP servers to find real-world susceptible parameters.
Key Findings
- Tool-drift is pervasive: across seven evaluated models and both memory delivery modes, biased personal memories raise judge-scored deflection on a 1 to 5 Likert scale by as much as +3.6 points relative to unbiased baselines, while neutral memories produce little deviation.
- Drift persists under production memory architectures: naturalistic multi-turn encoding, framework storage, and retrieval still yielded comparable or greater biased deflection scores because summarisation and retrieval often strip contextual signals and rely on shallow semantic overlap.
- Real-world exposure: scanning 6,062 MCP tools flagged 608 with parameters susceptible to memory-induced drift; validated reproductions of real schemas showed concrete drifts such as changing project visibility from private to public in a healthcare repo, or turning safesearch off in education-related searches, creating substantive outcome changes.
- Mechanism: biased memories act as implicit steering vectors, shifting internal activations along the same latent directions as explicit behavioural instructions, and they redistribute attention away from task-relevant inputs to memory entries that share surface-level keywords with target parameters.
- Partial mitigations: appending memory-usage guidelines to the system prompt reduced biased deflection modestly (for example a reported overall reduction of −0.52 in one evaluation), and a relevance filter can remove irrelevant memories in the deliberately clean MEMDRIFT setting, but the filter performs poorly on multi-hop, realistic relevance cases with 61.0% recall and a 10.3% false positive rate.
Limitations
The study focuses on single-tool, single-turn scenarios and parameter assignment rather than multi-tool chains or tool selection. It evaluates a finite set of bias dimensions and memory frameworks and uses synthesised scenarios to ensure objective measurement, which may simplify some real-world complexities.
Implications
An attacker or adversarial actor could craft or plant personal-style memories that exploit lexical overlap and bias-aligned parameter labels to steer an agent toward weaker safeguards, lower-cost tiers, faster but unsafe settings, or removal of approval gates. Because tool invocations are often executed without fine-grained human review, such manipulations can change professional outcomes stealthily and at scale. The results point to a novel attack surface at the intersection of memory management and tool-call generation that current prompt-based or coarse filtering defences do not fully address.