ShortSpan.ai logo

Personalised memories skew LLM agents' tool calls

Agents
Published: Mon, May 25, 2026 • By Theo Solander
Personalised memories skew LLM agents' tool calls
New research shows that biased personal memories in Large Language Model (LLM) agents can quietly steer tool-call parameters in professional tasks. Across seven models, drift scores rose by up to +3.6 on a 1–5 scale. A scan of 6,062 real tools flagged 608 as susceptible, with concrete, high-impact parameter changes reproduced.

Agent builders have been stitching long-term memory onto tool-using Large Language Model (LLM) systems because it makes them feel helpful and personal. The catch: those memories do not always stay in their lane. A new study names the failure mode “memory-induced tool-drift” and shows, with uncomfortable clarity, how a user’s recorded quirks can tip critical parameters inside tools that were supposed to be strictly professional.

The authors built MEMDRIFT, a 105-scenario benchmark spanning five bias dimensions (speed, frugality, minimalism, risk, autonomy) and seven domains, with hard walls between personal memories and work tasks. That wall matters: any influence from memory is unambiguously out of bounds. Across seven frontier models, biased memories pushed a judge-scored deflection by as much as +3.6 on a 1–5 scale. Neutral memories barely moved the needle, which points the finger squarely at personality-laden entries.

How the drift works

Mechanistically, these memories act like implicit steering vectors. They nudge internal activations along the same latent directions as explicit behavioural instructions, then pull attention away from task context toward memory snippets with surface-level keyword overlap to the target parameter. Production memory frameworks did not save the day. When the team ran the same tests through three real memory architectures, summarisation and retrieval often stripped the disambiguating context and rewarded that shallow overlap, yielding similar or worse drift.

The exposure is not theoretical. The team scanned 6,062 tools across 288 verified Model Context Protocol servers and flagged 608 with susceptible parameters. They replayed a validated subset and saw concrete flips: a healthcare repo’s project visibility changing from private to public, or safesearch being turned off in education-related searches.

This is an attacker’s kind of bug because it lives in the gap between memory and action. Craft a personal-style memory that says “I’m frugal” or “I like to move fast,” line it up with parameter names like tier, timeout, or approvals, and you can steer an agent toward weaker safeguards, cheaper plans, faster but unsafe settings, or removed gates. Tool calls often execute with low user observability. The nudge happens in the dark.

Why this smells familiar

If you squint, it looks like an old systems story: configuration drift. In the data centre, small preference leaks into global defaults and suddenly a fleet inherits weak ciphers. Here, a cosy memory about thrift leaks into cost or visibility parameters. Once you mingle a user’s whim with operational policy, the whim has a way of winning.

Defences helped a bit. Prompting models with memory-usage guidelines shaved scores (one reported overall reduction of −0.52), and a relevance filter worked in the benchmark’s clean setting. But in more realistic, multi-hop relevance, the filter caught only 61.0% with a 10.3% false positive rate. The study sticks to single-tool, single-turn cases, which keeps the measurement honest but likely understates the combinatorial mess of chained tools. That leaves an open question squarely in the agent community’s lap: what should the contract be between memory management and tool-call generation so that personalisation enriches conversations without silently rewriting the runbook?

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Memory-Induced Tool-Drift in LLM Agents

Authors: Mahavir Dabas, Jihyun Jeong, Ming Jin, and Ruoxi Jia
Modern LLM agents combine long-term memory for personalization with tool-calling interfaces for taking actions in the world -- a combination underpinning contemporary production systems. We study a previously unexamined failure of this combination: when personality-driven biases stored in memory (cost-consciousness, impatience, risk tolerance, etc.) silently affect tool calls in contexts where they are not applicable. We call this memory-induced tool-drift and operationalize it through MEMDRIFT, a benchmark of 105 scenarios spanning five bias dimensions and seven professional domains, generated through an automated adversarial pipeline. Across seven frontier models -- including those with extended reasoning -- biased memories raise deflection scores (a judge-scored measure of parameter deviation from unbiased baselines) by up to $+3.6$ points on a 1--5 scale. Tool-drift persists when memory management is handled by three production memory architectures. The phenomenon affects real-world tools: scanning 6{,}062 tools across 288 verified MCP servers, we flag 608 with susceptible parameters and confirm tool-drift on a validated subset. Mechanistically, biased memories act as implicit steering vectors, pushing activations along the same latent directions as explicit behavioral instructions. They also redistribute attention from task-relevant context toward memory entries with surface-level keyword overlap to the target parameter. Standard defenses -- prompt-based relevance instructions and memory filters -- reduce drift but do not eliminate it. As agents take increasingly consequential actions on a user's behalf, memory-induced tool-drift represents a systematic vulnerability that current safeguards do not address, motivating dedicated defenses at the intersection of memory management and tool-call generation.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies a failure mode in memory-augmented large language model agents where personality-driven memories stored for personalisation silently bias tool-call parameters in professional contexts where those memories are irrelevant. This "memory-induced tool-drift" matters because agent tool calls often execute automatically with low user observability and can produce irreversible or high-consequence outcomes if parameters are altered inappropriately.

Approach

The authors formalise the problem and introduce MEMDRIFT, a benchmark of 105 adversarially generated scenarios spanning five bias dimensions (speed/impatience, resource frugality, minimalism/conciseness, risk permissiveness, autonomy/self-reliance) and seven professional domains (healthcare, finance, legal, software infrastructure, education, e-commerce, marketing). Scenarios enforce strict separation between personal memories and professional tasks so any memory influence is unambiguously inappropriate. Generation uses an iterative LLM pipeline with adversarial refinement and an LLM judge to maximise measurable drift. Evaluation covers two delivery modes: direct memory injection into the system prompt and three production memory frameworks (Mem0, MemPalace, SimpleMem). Seven frontier models were tested, including closed-source and open-weight systems, and a vulnerability scan examined 6,062 tools across 288 MCP servers to find real-world susceptible parameters.

Key Findings

  • Tool-drift is pervasive: across seven evaluated models and both memory delivery modes, biased personal memories raise judge-scored deflection on a 1 to 5 Likert scale by as much as +3.6 points relative to unbiased baselines, while neutral memories produce little deviation.
  • Drift persists under production memory architectures: naturalistic multi-turn encoding, framework storage, and retrieval still yielded comparable or greater biased deflection scores because summarisation and retrieval often strip contextual signals and rely on shallow semantic overlap.
  • Real-world exposure: scanning 6,062 MCP tools flagged 608 with parameters susceptible to memory-induced drift; validated reproductions of real schemas showed concrete drifts such as changing project visibility from private to public in a healthcare repo, or turning safesearch off in education-related searches, creating substantive outcome changes.
  • Mechanism: biased memories act as implicit steering vectors, shifting internal activations along the same latent directions as explicit behavioural instructions, and they redistribute attention away from task-relevant inputs to memory entries that share surface-level keywords with target parameters.
  • Partial mitigations: appending memory-usage guidelines to the system prompt reduced biased deflection modestly (for example a reported overall reduction of −0.52 in one evaluation), and a relevance filter can remove irrelevant memories in the deliberately clean MEMDRIFT setting, but the filter performs poorly on multi-hop, realistic relevance cases with 61.0% recall and a 10.3% false positive rate.

Limitations

The study focuses on single-tool, single-turn scenarios and parameter assignment rather than multi-tool chains or tool selection. It evaluates a finite set of bias dimensions and memory frameworks and uses synthesised scenarios to ensure objective measurement, which may simplify some real-world complexities.

Implications

An attacker or adversarial actor could craft or plant personal-style memories that exploit lexical overlap and bias-aligned parameter labels to steer an agent toward weaker safeguards, lower-cost tiers, faster but unsafe settings, or removal of approval gates. Because tool invocations are often executed without fine-grained human review, such manipulations can change professional outcomes stealthily and at scale. The results point to a novel attack surface at the intersection of memory management and tool-call generation that current prompt-based or coarse filtering defences do not fully address.


Related Articles

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.