CodecAttack slips past audio LLM compression defences
Attacks
We have told ourselves a comforting story: lossy codecs scrub away adversarial noise before it reaches an Audio Large Language Model (LLM). CodecAttack spoils that plot. Instead of nudging the waveform and hoping the artefacts survive Opus or MP3, the attack edits the neural codec’s own latent space, then rides through the very channel that keeps those features intact.
How it works
The authors optimise a bounded perturbation inside a neural audio codec, EnCodec, rather than on raw samples. During optimisation they pass the audio through Opus at multiple bitrates using straight-through Expectation-over-Transformation, so the perturbation learns to survive 16, 24, 32, 64 and 128 kbps. The victim model is left untouched. The attacker needs white-box access only to compute gradients offline; delivery is a standard audio file.
Why this matters: many real deployments re-encode audio at least once. If compression were a free filter, attackers would be out of luck. The numbers say otherwise. CodecAttack posts an average 85.5 percent target-substring success on Opus at moderate bitrates and reaches 88 percent at Opus 128 kbps. A matched waveform baseline trained with the same hardening never clears 26 percent at any bitrate. Trained on Opus, the attack transfers without retraining to held-out codecs, reaching up to 100 percent on MP3 and 84 percent on AAC-LC in some music tasks; speech carriers fare worse under AAC quantisation.
Why it survives
Compression spends bits where our ears care. The perturbation learns that map. A per-band analysis shows 88.4 percent of its energy sits below 4 kHz and a third below 400 Hz, right where codecs preserve detail; waveform attacks waste energy in higher bands that encoders discard. Multi-bitrate EoT is not optional: remove it and success collapses to zero at Opus 32 kbps and below.
The study evaluates three scenarios that look uncomfortably familiar to practitioners: financial voice agents, interview screening, and music-industry content detection. Reliable injection saturates around eight-word targets and degrades past roughly twenty words. At the main operating point the speech remains intelligible, even if quality takes a perceptual knock.
This all has a whiff of the old phone-phreaking trick: the blue box sent a 2600 Hz tone because that is exactly what the network preserved. Or think JPEG steganography hiding in low-frequency DCT coefficients. If the channel loves it, attackers live there. The open questions now are codec-aware training, cross-codec evaluation as standard, and what a robust front end for audio LLMs actually looks like.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
Codec-Robust Attacks on Audio LLMs
🔍 ShortSpan Analysis of the Paper
Problem
The paper examines whether lossy audio compression—commonly used in messaging, VoIP and streaming pipelines—defends Audio Large Language Models against targeted adversarial audio. Prior work showed waveform perturbations are removed by codecs, but deployments always include at least one codec, so a robust attack that survives compression would expose many real-world voice-agent systems to covert manipulation. The study asks whether adversaries can craft perturbations that survive codec-mediated delivery without altering the victim model.
Approach
The authors introduce CodecAttack, which optimises a bounded perturbation in the continuous latent space of a neural audio codec (EnCodec) rather than directly on the waveform. The perturbation is decoded to waveform and hardened against real-world compression by applying multi-bitrate straight-through Expectation-over-Transformation (EoT) over Opus bitrates {16,24,32,64,128} kbps during optimisation. The attacker is external and does not modify the victim model; white-box access is assumed only for offline gradient computation. Evaluation covers three deployment scenarios—financial voice agents, interview screening, and music-industry content detection—three victim models, and a codec grid including held-out MP3 and AAC-LC at multiple bitrates. A matched waveform-baseline attack uses the same EoT and matched SNR to isolate the effect of the perturbation domain.
Key Findings
- High attack success after compression: CodecAttack achieves an average 85.5% target-substring attack success rate on Opus at moderate bitrates; specific results include 88% ASR at Opus 128 kbps and 80–90% ASR across Opus/MP3 in many settings.
- Waveform attacks fail under identical hardening: a waveform-domain baseline trained with identical EoT and matched SNR never exceeds 26% ASR at any bitrate, showing the latent domain is the decisive factor for codec robustness.
- Cross-codec transfer: attacks trained on Opus transfer without retraining to held-out MP3 (up to 100% ASR reported in some music tasks) and achieve substantial ASR on AAC-LC for music carriers (up to 84% in some cases), though speech carriers are more vulnerable to AAC quantisation.
- Spectral mechanism: the latent perturbation concentrates energy in low frequencies—88.4% below 4 kHz and 33% below 400 Hz—matching where codecs allocate most bits; waveform perturbations spread into higher bands that codecs discard.
- Multi-bitrate EoT is necessary: removing EoT collapses ASR to 0% at Opus ≤32 kbps, demonstrating the need to train across bitrates for survival.
- Capacity and quality trade-offs: reliable injection saturates for targets up to about 8 words and degrades past ≈20 words; at the primary operating point (ε=1.0) speech intelligibility remains high (STOI≈0.90) despite perceptual quality loss.
Limitations
The attack requires white-box access to compute gradients for a chosen codec and is optimised per victim model, so perturbations do not reliably transfer between models without reoptimisation. Defence evaluations are not exhaustive; proposed mitigations are left for future work.
Implications
An external adversary can embed targeted commands into benign-sounding audio that survive typical re-encoding and reach unmodified Audio LLMs, enabling unauthorised actions such as banking authorisation bypass, falsified hiring recommendations, or evasion of content-detection pipelines. The results imply that waveform-focused compression defences are insufficient and that attackers can exploit codec-preserved latent subspaces to mount practical, covert injections across diverse codecs and bitrates.