VideoEraser Blocks Unwanted Concepts in Text-to-Video

Defenses

Published: Fri, Aug 22, 2025 • By Adrian Calder

VideoEraser Blocks Unwanted Concepts in Text-to-Video

New research introduces VideoEraser, a plug-and-play module that prevents text-to-video models from generating specific unwanted content without retraining. It tweaks prompt embeddings and steers latent noise to suppress targets, cutting undesirable outputs by about 46% on average. The approach works across models but needs testing against adaptive bypasses.

Text-to-video models are getting dangerously good at producing convincing content, and that includes the wrong kinds of videos: celebrity deepfakes, copyrighted styles, and explicit material. VideoEraser offers a practical, training-free guardrail. It masks targeted concepts at inference using two steps: nudging prompt embeddings away from the concept and steering the noise process so the erased idea does not emerge.

Why this matters: the paper reports an average 46 percent reduction in targeted content versus prior defences, with celebrity presence and pornography toxicity dropping by roughly half or more in controlled tests. That is real benefit for operators who need a rapid mitigation they can bolt onto existing T2V pipelines without retraining expensive models.

Trade-offs are blunt and important. VideoEraser increases processing work (roughly 1.4 times slower before optimisation) and works best on concrete, well defined concepts. Abstract attributes may only be partially removed, and a determined attacker can try prompt engineering or combine prompts to bypass the filter. The module itself becomes a new attack surface if it is swapped or tampered with in deployment.

What to do next: treat VideoEraser as a tool, not a panacea. Test it with your most likely bypass prompts, measure collateral damage to legitimate content, and quantify throughput impacts. Combine it with logging, human review for edge cases, and policy controls around model access. Maintain an update and red-team cadence so adversaries do not turn your safety layer into a suggestion box.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

The rapid growth of text-to-video (T2V) diffusion models has raised concerns about privacy, copyright, and safety due to their potential misuse in generating harmful or misleading content. These models are often trained on numerous datasets, including unauthorized personal identities, artistic creations, and harmful materials, which can lead to uncontrolled production and distribution of such content. To address this, we propose VideoEraser, a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts, even when explicitly prompted with those concepts. Designed as a plug-and-play module, VideoEraser can seamlessly integrate with representative T2V diffusion models via a two-stage process: Selective Prompt Embedding Adjustment (SPEA) and Adversarial-Resilient Noise Guidance (ARNG). We conduct extensive evaluations across four tasks, including object erasure, artistic style erasure, celebrity erasure, and explicit content erasure. Experimental results show that VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability. Notably, VideoEraser achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.

🔍 ShortSpan Analysis of the Paper

Problem

Text-to-video diffusion models can reproduce unauthorised identities, artistic styles, explicit material and other harmful content because they are trained on large, often unfiltered datasets. This raises privacy, copyright and safety concerns and creates a need for practical methods that prevent generation of specific undesirable concepts without costly retraining.

Approach

The authors propose VideoEraser, a training-free, plug-and-play two-stage module that modifies prompt embeddings and denoising guidance at inference. Selective Prompt Embedding Adjustment (SPEA) detects trigger tokens and projects their embeddings away from the target concept subspace to suppress concept activation. Adversarial-Resilient Noise Guidance (ARNG) steers latent noise away from the erased concept while enforcing step-to-step and frame-to-frame consistency. VideoEraser requires no model updates and was evaluated on AnimateDiff and other T2V frameworks across object, artistic style, celebrity and explicit content erasure tasks, compared to baselines including SAFREE and negative prompting.

Key Findings

Average suppression: VideoEraser reduces undesirable content by 46% on average across four tasks versus baselines.
Object erasure: ACCe for targeted object classes fell by about 74% on average, while preserving generation of unrelated classes.
Celebrity and style: Celebrity presence ACCe dropped by over 50% on average; targeted artistic styles were substantially weakened.
Safety and robustness: Pornography toxicity score fell by 61% and adversarial attack success rate dropped by over 40% on average.
Fidelity and generalisability: VideoEraser attained top or near‑top fidelity scores and worked across multiple T2V models; ablations show both SPEA and ARNG contribute to efficacy and robustness.

Limitations

VideoEraser adds computational overhead (about 1.4x processing time prior to optimisation). It is more effective for well defined, concrete concepts (for example celebrities) than for broader or abstract concepts (for example some artistic attributes or nudity), which may only be partially removed. The authors acknowledge potential for more sophisticated circumvention methods and call for further work.

Why It Matters

VideoEraser offers a practical, deployable defence for content moderation in T2V pipelines that can reduce privacy breaches, deepfakes, copyright infringement and NSFW generation without retraining. It strengthens an operator’s toolbox for safer video generation, but must be tested against adaptive bypass attempts and improved for abstract concept erasure to avoid unintended collateral effects.

Attribution Original paper on arXiv