Compound attack steers RAG via injection and poisoning
Agents
Retrieval-Augmented Generation (RAG) was meant to tame Large Language Models (LLMs) by bolting on fresher facts. A new paper argues that same bolt-on opens the door to a more practical subversion: a compound attack that does not need to know the user’s question to steer the answer.
How the attack works
The authors’ PIDP-Attack has two moves. First, insert a limited number of poisoned passages into the retrieval store. These look relevant but carry adversarial content or instructions. Second, at inference time, quietly append malicious characters to the user’s query. That small injection biases the LLM to heed the poisoned content or follow attacker directions, even when the original query is unknown.
In tests on Natural Questions, HotpotQA and MS-MARCO, and across eight different LLMs, PIDP-Attack beats the PoisonedRAG baseline. Attack success rises by 4% to 16% while retrieval precision stays high. In other words, the system still fetches passages that appear on-topic, so the outputs look legitimate even as they are being steered.
Why it matters
If an attacker can touch both the query path and the retrieval index, they can manipulate a wide range of answers without tailoring payloads to each question. That is a higher bar than a single vulnerability, but not an absurd one in environments with shared middleware or loosely governed content feeds. The payoff is broader control with minimal per-query effort.
This also undercuts a comfortable metric. High retrieval precision does not mean safe outcomes if the index itself is booby-trapped with relevant-looking poison. Dashboards can stay green while the model drifts into attacker-defined behaviour.
There are limits. The results are from three QA benchmarks and eight models under lab conditions. The threat model assumes the attacker can both append to queries at inference and write to the retrieval store. Feasible in some setups, harder in tightly locked ones. The paper does not cover production cloud deployments.
The commercial read is plain. If you ship RAG-enabled features, your risk hinges on how easy it is to tamper with queries and to seed your index. If either path is open, you have a credible manipulation route that evades simple filters. If both are shut, the finding is mostly academic for now. Open questions: how small a poison set suffices at scale, how robust the trick is under aggressive sanitisation, and whether source provenance can close the gap without wrecking recall. Watch this space.