ShortSpan.ai logo

CodecAttack slips past audio LLM compression defences

Attacks
Published: Wed, May 20, 2026 • By Theo Solander
CodecAttack slips past audio LLM compression defences
New research shows adversaries can hide targeted prompts for Audio Large Language Models inside the latent space of neural audio codecs. CodecAttack survives Opus, MP3 and AAC-LC compression with high success rates, outclassing waveform attacks. The work undermines compression as a defence and exposes practical risks for voice agents and detection pipelines.

We have told ourselves a comforting story: lossy codecs scrub away adversarial noise before it reaches an Audio Large Language Model (LLM). CodecAttack spoils that plot. Instead of nudging the waveform and hoping the artefacts survive Opus or MP3, the attack edits the neural codec’s own latent space, then rides through the very channel that keeps those features intact.

How it works

The authors optimise a bounded perturbation inside a neural audio codec, EnCodec, rather than on raw samples. During optimisation they pass the audio through Opus at multiple bitrates using straight-through Expectation-over-Transformation, so the perturbation learns to survive 16, 24, 32, 64 and 128 kbps. The victim model is left untouched. The attacker needs white-box access only to compute gradients offline; delivery is a standard audio file.

Why this matters: many real deployments re-encode audio at least once. If compression were a free filter, attackers would be out of luck. The numbers say otherwise. CodecAttack posts an average 85.5 percent target-substring success on Opus at moderate bitrates and reaches 88 percent at Opus 128 kbps. A matched waveform baseline trained with the same hardening never clears 26 percent at any bitrate. Trained on Opus, the attack transfers without retraining to held-out codecs, reaching up to 100 percent on MP3 and 84 percent on AAC-LC in some music tasks; speech carriers fare worse under AAC quantisation.

Why it survives

Compression spends bits where our ears care. The perturbation learns that map. A per-band analysis shows 88.4 percent of its energy sits below 4 kHz and a third below 400 Hz, right where codecs preserve detail; waveform attacks waste energy in higher bands that encoders discard. Multi-bitrate EoT is not optional: remove it and success collapses to zero at Opus 32 kbps and below.

The study evaluates three scenarios that look uncomfortably familiar to practitioners: financial voice agents, interview screening, and music-industry content detection. Reliable injection saturates around eight-word targets and degrades past roughly twenty words. At the main operating point the speech remains intelligible, even if quality takes a perceptual knock.

This all has a whiff of the old phone-phreaking trick: the blue box sent a 2600 Hz tone because that is exactly what the network preserved. Or think JPEG steganography hiding in low-frequency DCT coefficients. If the channel loves it, attackers live there. The open questions now are codec-aware training, cross-codec evaluation as standard, and what a robust front end for audio LLMs actually looks like.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Codec-Robust Attacks on Audio LLMs

Authors: Jaechul Roh, Jean-Philippe Monteuuis, Jonathan Petit, and Amir Houmansdar
Prior attacks on Audio Large Language Models (Audio LLMs) demonstrated that carefully crafted waveform-domain perturbations can force targeted adversarial outputs. As a defense mechanism against these attacks, real-world codec compression preprocessing has been studied to both detect and remove the perturbations. Yet no existing attack has demonstrated robustness against these compressions. We introduce CodecAttack, which optimizes a perturbation in a neural audio codec's continuous latent space rather than directly perturbing the audio waveform. We show that the codec's compression channel, which discards waveform perturbations, transmits perturbations crafted in its own latent space. To further harden the attack across real-world compression channels, we apply multi-bitrate straight-through Expectation-over-Transformation (EoT), all without modifying the target model. Across three realistic Audio LLM deployment scenarios and three target models, CodecAttack achieves an average 85.5% target-substring attack success rate (ASR) on Opus at moderate bitrates, while the waveform baseline trained with identical EoT hardening does not exceed 26% at any bitrate. The attack transfers to held-out codecs, reaching up to 100% ASR on MP3 and 84% on AAC-LC without retraining. A per-band energy analysis shows that the latent perturbation concentrates below 4kHz, exactly where codecs allocate the most bits, while the waveform baseline spreads into higher frequencies that codecs discard. These results demonstrate that lossy compression is not a reliable defense against adversarial audio and that codec-aware attacks pose a practical threat to deployed Audio LLM systems.

🔍 ShortSpan Analysis of the Paper

Problem

The paper examines whether lossy audio compression—commonly used in messaging, VoIP and streaming pipelines—defends Audio Large Language Models against targeted adversarial audio. Prior work showed waveform perturbations are removed by codecs, but deployments always include at least one codec, so a robust attack that survives compression would expose many real-world voice-agent systems to covert manipulation. The study asks whether adversaries can craft perturbations that survive codec-mediated delivery without altering the victim model.

Approach

The authors introduce CodecAttack, which optimises a bounded perturbation in the continuous latent space of a neural audio codec (EnCodec) rather than directly on the waveform. The perturbation is decoded to waveform and hardened against real-world compression by applying multi-bitrate straight-through Expectation-over-Transformation (EoT) over Opus bitrates {16,24,32,64,128} kbps during optimisation. The attacker is external and does not modify the victim model; white-box access is assumed only for offline gradient computation. Evaluation covers three deployment scenarios—financial voice agents, interview screening, and music-industry content detection—three victim models, and a codec grid including held-out MP3 and AAC-LC at multiple bitrates. A matched waveform-baseline attack uses the same EoT and matched SNR to isolate the effect of the perturbation domain.

Key Findings

  • High attack success after compression: CodecAttack achieves an average 85.5% target-substring attack success rate on Opus at moderate bitrates; specific results include 88% ASR at Opus 128 kbps and 80–90% ASR across Opus/MP3 in many settings.
  • Waveform attacks fail under identical hardening: a waveform-domain baseline trained with identical EoT and matched SNR never exceeds 26% ASR at any bitrate, showing the latent domain is the decisive factor for codec robustness.
  • Cross-codec transfer: attacks trained on Opus transfer without retraining to held-out MP3 (up to 100% ASR reported in some music tasks) and achieve substantial ASR on AAC-LC for music carriers (up to 84% in some cases), though speech carriers are more vulnerable to AAC quantisation.
  • Spectral mechanism: the latent perturbation concentrates energy in low frequencies—88.4% below 4 kHz and 33% below 400 Hz—matching where codecs allocate most bits; waveform perturbations spread into higher bands that codecs discard.
  • Multi-bitrate EoT is necessary: removing EoT collapses ASR to 0% at Opus ≤32 kbps, demonstrating the need to train across bitrates for survival.
  • Capacity and quality trade-offs: reliable injection saturates for targets up to about 8 words and degrades past ≈20 words; at the primary operating point (ε=1.0) speech intelligibility remains high (STOI≈0.90) despite perceptual quality loss.

Limitations

The attack requires white-box access to compute gradients for a chosen codec and is optimised per victim model, so perturbations do not reliably transfer between models without reoptimisation. Defence evaluations are not exhaustive; proposed mitigations are left for future work.

Implications

An external adversary can embed targeted commands into benign-sounding audio that survive typical re-encoding and reach unmodified Audio LLMs, enabling unauthorised actions such as banking authorisation bypass, falsified hiring recommendations, or evasion of content-detection pipelines. The results imply that waveform-focused compression defences are insufficient and that attackers can exploit codec-preserved latent subspaces to mount practical, covert injections across diverse codecs and bitrates.


Related Articles

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.