ShortSpan.ai logo

Compound attack steers RAG via injection and poisoning

Agents
Published: Fri, Mar 27, 2026 • By Adrian Calder
Compound attack steers RAG via injection and poisoning
Researchers present PIDP-Attack, a compound method that steers Retrieval-Augmented Generation (RAG) systems by combining prompt injection on user queries with a few poisoned passages in the index. It outperforms PoisonedRAG by 4–16% across three QA datasets and eight LLMs while keeping retrieval precision high, making manipulations harder to spot.

Retrieval-Augmented Generation (RAG) was meant to tame Large Language Models (LLMs) by bolting on fresher facts. A new paper argues that same bolt-on opens the door to a more practical subversion: a compound attack that does not need to know the user’s question to steer the answer.

How the attack works

The authors’ PIDP-Attack has two moves. First, insert a limited number of poisoned passages into the retrieval store. These look relevant but carry adversarial content or instructions. Second, at inference time, quietly append malicious characters to the user’s query. That small injection biases the LLM to heed the poisoned content or follow attacker directions, even when the original query is unknown.

In tests on Natural Questions, HotpotQA and MS-MARCO, and across eight different LLMs, PIDP-Attack beats the PoisonedRAG baseline. Attack success rises by 4% to 16% while retrieval precision stays high. In other words, the system still fetches passages that appear on-topic, so the outputs look legitimate even as they are being steered.

Why it matters

If an attacker can touch both the query path and the retrieval index, they can manipulate a wide range of answers without tailoring payloads to each question. That is a higher bar than a single vulnerability, but not an absurd one in environments with shared middleware or loosely governed content feeds. The payoff is broader control with minimal per-query effort.

This also undercuts a comfortable metric. High retrieval precision does not mean safe outcomes if the index itself is booby-trapped with relevant-looking poison. Dashboards can stay green while the model drifts into attacker-defined behaviour.

There are limits. The results are from three QA benchmarks and eight models under lab conditions. The threat model assumes the attacker can both append to queries at inference and write to the retrieval store. Feasible in some setups, harder in tightly locked ones. The paper does not cover production cloud deployments.

The commercial read is plain. If you ship RAG-enabled features, your risk hinges on how easy it is to tamper with queries and to seed your index. If either path is open, you have a credible manipulation route that evades simple filters. If both are shut, the finding is mostly academic for now. Open questions: how small a poison set suffices at scale, how robust the trick is under aggressive sanitisation, and whether source provenance can close the gap without wrecking recall. Watch this space.


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.