Researchers Expose Cache Attacks Against Diffusion Models

Attacks

Published: Fri, Aug 29, 2025 • By Natalie Kestrel

New research shows that approximate caching used to speed diffusion image models can leak data and let attackers steal prompts, run covert channels, and inject logos into other users' outputs. The work demonstrates attacks across models and datasets and warns that service-side caching can break user isolation for days.

A new paper from researchers at the University of Waterloo demonstrates a surprising and practical weakness in how some services speed up diffusion image generation. By reusing intermediate cached states, systems like Adobe's NIRVANA-style caches can be probed remotely to create covert channels, recover user prompts, and even poison cached content so future outputs render attacker logos. The attacks work across tested models including FLUX and Stable Diffusion 3 and can persist for days.

There are two sides to this story. On one hand, approximate caching is a necessary engineering trick: it saves GPU time, reduces latency, and lowers costs for users. On the other, this very optimization can break the isolation between customers and create new vectors for data leakage and manipulation. The research shows those dangers are not just theoretical; they achieve high accuracy in cache-based communication, prompt recovery, and logo injection across public datasets.

I lean toward a practical middle path. Alarmism helps sell headlines but paralyzes useful fixes. At the same time, shrugging and rolling out caches everywhere is irresponsible. The right response is targeted engineering and policy: treat cached intermediates as sensitive, not disposable. Regulators and cloud providers should demand threat modeling for caching features, and vendors should stop treating cache hits as harmless.

Pragmatic guidance: partition caches per user or tenant, add cryptographic binding and provenance metadata, log and audit cache accesses, enforce strict cache invalidation windows, and add anomaly detection for probing patterns. Vendors should publish threat models and mitigation checklists. This is fixable if companies stop pretending speed is an acceptable trade for silence on security. For further context see the University of Waterloo paper and public descriptions of NIRVANA-style caches and NIST AI security guidance.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Breaking Diffusion with Cache: Exploiting Approximate Caches in Diffusion Models

Authors: Desen Sun, Shuncheng Jie, and Sihang Liu

Diffusion models are a powerful class of generative models that produce content, such as images, from user prompts, but they are computationally intensive. To mitigate this cost, recent academic and industry work has adopted approximate caching, which reuses intermediate states from similar prompts in a cache. While efficient, this optimization introduces new security risks by breaking isolation among users. This work aims to comprehensively assess new security vulnerabilities arising from approximate caching. First, we demonstrate a remote covert channel established with the cache, where a sender injects prompts with special keywords into the cache and a receiver can recover that even after days, to exchange information. Second, we introduce a prompt stealing attack using the cache, where an attacker can recover existing cached prompts based on cache hit prompts. Finally, we introduce a poisoning attack that embeds the attacker's logos into the previously stolen prompt, to render them in future user prompts that hit the cache. These attacks are all performed remotely through the serving system, which indicates severe security vulnerabilities in approximate caching.

🔍 ShortSpan Analysis of the Paper

Authors

Desen Sun, Shuncheng Jie, and Sihang Liu, University of Waterloo, Canada.

Problem

Diffusion models are powerful but computationally intensive; approximate caching reuses intermediate states from similar prompts to speed up generation. This optimisation can break user isolation by enabling cross user interaction through the cache, creating security vulnerabilities in AI service architectures. The work investigates three remote attack vectors that arise from approximate caching: a remote covert channel, a prompt stealing attack, and a poisoning attack that embeds attacker content into future user outputs.

Approach

The study replicates the state of the art approximate cache system NIRVANA from Adobe and deploys two diffusion models, FLUX and Stable Diffusion 3 (SD3), on cloud GPUs. It uses real world prompt datasets DiffusionDB and Lexica. The central idea is that approximate caches reuse intermediate results for similar prompts; hits speed up generation and produce outputs with preserved structure, enabling timing and content based analysis to infer information. Attack primitives rely on comparing latency (Attack Primitive 1) and image similarity (Attack Primitive 2) to identify cache hits and link them to cached prompts. The evaluation considers cache configurations including up to 50 percent of denoising steps cached, and threshold based hit detection, with metric analyses across latency distributions, SSIM, and CLIP based similarities.

Key Findings

Remote covert channel: A sender injects prompts with special keywords and a marker into the approximate cache, while a receiver probes the cache to detect sender activity. Cache hits and misses are encoded as bits, with the channel achieving 97.8 percent overall accuracy for FLUX and 95.8 percent for SD3. The sender’s prompts can remain cached for over 44 hours, enabling days long covert communication through the serving system.
Prompt stealing attack, CacheTransparency: Attackers probe the approximate cache and use timing and image similarity to cluster prompts that hit the same cache, then recover the victim prompt. Recovered prompts achieved average semantic similarity in the range 0.75 to 0.81 with original prompts, substantially higher than a naive probing baseline. The recovered images maintained high similarity, with PSNR values above 25 and SSIM scores around 0.75 to 0.85 depending on model and dataset. Across DiffusionDB and Lexica, the approach recovered prompts that closely match victims and produced outputs that are nearly indistinguishable from the originals in many cases.
Cache pollution attack: An attacker injects logos into the cache by embedding logo content into stolen prompts, causing future user prompts that hit the poisoned cache to render the attacker logos without them being specified in the prompt. The approach increases hit rates and render quality of logos, with higher success on FLUX than on SD3. Across Logos including Nike, McDonald’s, Apple, Chanel, Triangle and Blue Moon, the poisoning achieved higher hit and render rates than naive direct injection, and scale depended on logo complexity and model strength.

Limitations

Results are obtained under controlled configurations using specific caches, models, and datasets; performance and feasibility depend on the cache policy and replacement strategies tested. The attacks rely on timing and content based detection that may be affected by network conditions and model specifics. False positives in hit detection can reduce attack efficacy, and large numbers of probing prompts are required for accurate recovery. Generalisability to other diffusion models, caching schemes, or operational deployments is not claimed. The work evaluates two diffusion models (FLUX and SD3) and two datasets (DiffusionDB and Lexica) with fixed cache sizes and configurations, so outcomes may differ under alternative setups.

Why It Matters

The findings reveal that approximate caching used to accelerate diffusion model services can undermine user isolation, enabling remote covert channels, prompt theft, and content poisoning via the cache. The risks persist across days and through the serving system, indicating real security gaps in AI service architectures. Potential consequences include cross user data leakage, exfiltration of cached prompts, and injection of manipulated prompts or logos into outputs without direct model access. Mitigations include strict per user or per tenant cache isolation, cryptographic binding and provenance of cached items, tamper evident logging, access controls, cache invalidation policies, and anomaly or audit mechanisms. Societal and security implications involve reduced privacy and trust in AI services, covert information leakage, and scalable output manipulation that could affect surveillance and branding.

Attribution Original paper on arXiv