Researchers Expose Few-Query Attacks on Multi-Task AI

Attacks

Published: Mon, Aug 11, 2025 • By Elise Veyron

New research shows practical black-box attacks that use only a few dozen to a few hundred queries to fool multi-task AI services. The method transfers adversarial text across tasks like translation, summarization, and image generation, affecting commercial APIs and large models. This raises urgent operational risks for public-facing AI systems and content pipelines.

A new paper shows that attackers no longer need white-box access or thousands of probes to break multi-task AI. The method, called CEMA, trains lightweight substitute models using a few queries and auxiliary texts, then crafts adversarial inputs that transfer across tasks. In plain terms, an attacker can nudge a translation, summary, or image prompt off course with only a few dozen to a few hundred interactions.

Definitions matter. Black-box means you only see outputs, not internals. A query is one input you send to an API. Adversarial examples are inputs designed to make the model produce wrong or unexpected outputs. Transferability refers to attacks that work across different tasks or models.

Why this matters: many organizations assume commercial APIs are safe by obscurity. CEMA proves obscurity is a weak defense. The attack hits real services such as translation APIs and large language models, so misinformation, content corruption, and downstream automation failures become realistic threats for production systems.

There are trade-offs. CEMA needs auxiliary data and training several substitute models, which costs time and compute. Defenses like adversarial training, language modifiers, and rate limits help, but none are silver bullets. This is a caution against performative compliance: logging a policy without testing it against adaptive attacks is theatre, not safety.

Policy and governance should map to measurable controls: query-rate limits, robust logging and retention, anomaly detection on outputs, mandated red-team exercises, and incident reporting for model failures. Short-term actions you can take this quarter: enforce strict rate limits and input validation, run few-shot adversarial tests against your endpoints, enable detailed logging, and tune anomaly alerts. Later investments: adversarial training for multi-task models, provenance and content fingerprints, third-party audits, and engagement with regulators to shape practical requirements that drive real security rather than checkboxes.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries

Current multi-task adversarial text attacks rely on abundant access to shared internal features and numerous queries, often limited to a single task type. As a result, these attacks are less effective against practical scenarios involving black-box feedback APIs, limited queries, or multiple task types. To bridge this gap, we propose \textbf{C}luster and \textbf{E}nsemble \textbf{M}ulti-task Text Adversarial \textbf{A}ttack (\textbf{CEMA}), an effective black-box attack that exploits the transferability of adversarial texts across different tasks. CEMA simplifies complex multi-task scenarios by using a \textit{deep-level substitute model} trained in a \textit{plug-and-play} manner for text classification, enabling attacks without mimicking the victim model. This approach requires only a few queries for training, converting multi-task attacks into classification attacks and allowing attacks across various tasks.CEMA generates multiple adversarial candidates using different text classification methods and selects the one that most effectively attacks substitute models.In experiments involving multi-task models with two, three, or six tasks--spanning classification, translation, summarization, and text-to-image generation--CEMA demonstrates significant attack success with as few as 100 queries. Furthermore, CEMA can target commercial APIs (e.g., Baidu and Google Translate), large language models (e.g., ChatGPT 4o), and image-generation models (e.g., Stable Diffusion V2), showcasing its versatility and effectiveness in real-world applications.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies black-box multi-task adversarial text attacks under realistic constraints: limited query access, heterogeneous task types (classification, translation, summarisation, text-to-image), and inaccessible internal model features. Existing methods typically require many queries or white-box access and focus on single-task settings, making them less effective against commercial APIs and large language models in practice.

Approach

The authors propose Cluster and Ensemble Multi-task Text Adversarial Attack (CEMA). CEMA trains a deep-level substitute model using auxiliary texts and victim-model outputs vectorised by a pre-trained encoder, then applies binary clustering to produce two deep-level labels. The substitute model converts a multi-task attack into a text-classification attack and enables transfer-based generation of adversarial candidates using multiple attack algorithms (Hotflip, FD, TextBugger). Candidates that meet a cosine-similarity threshold (epsilon=0.8) are retained and the final example is selected by retraining multiple substitute models and choosing the candidate that misleads the most substitutes. Experiments use SST5 and Emotion datasets, 100 unlabeled auxiliary texts, six substitute models trained on a 24GB NVIDIA 3090 (approx. 4 minutes per model, 418 MB each), and limits on query budgets for fair comparison.

Key Findings

CEMA achieves strong transfer attacks with as few as 100 queries, outperforming baselines on classification and translation tasks.
On LLM victims CEMA attains a best ASR of 38.63%; on classification tasks ASR exceeds 59% and reaches 80.80% in some cases; translation BLEU scores drop below 0.16 in top results.
CEMA scales to six downstream tasks (ASR >60%, BLEU <0.3) and generalises to APIs and models including Baidu, Ali/Google Translate, ChatGPT-4o, Claude 3.5, and Stable Diffusion V2; on a broader task set BLEU=0.29, RDP=47%, CDP=56%.

Limitations

CEMA requires auxiliary data and training multiple substitute models, which increases time, computation and storage. Defence experiments show mitigations (language modifiers, adversarial training) reduce but do not eliminate attack success. Other constraints and full robustness evaluations are not reported.

Why It Matters

CEMA demonstrates that practical black-box multi-task systems, including commercial APIs and large models, are vulnerable to few-shot transfer attacks that require limited queries and only auxiliary data. This raises real-world security concerns for multi-task deployments, automated translation, summarisation and text-to-image pipelines, and suggests urgent need for robust defences and query-limited detection strategies.

Attribution Original paper on arXiv