ShortSpan.ai logo

Compound attack hijacks RAG with prompt injection

Agents
Published: Fri, Mar 27, 2026 • By Rowan Vale
Compound attack hijacks RAG with prompt injection
New work on PIDP-Attack shows a compound threat to Retrieval-Augmented Generation that fuses prompt injection with database poisoning. By appending a fixed suffix to queries and seeding a few poisoned passages, it steers answers without knowing user intent, improving success 4–16% over prior poisoning and often exceeding 90% on many models.

Many teams use Retrieval-Augmented Generation (RAG) to ground Large Language Models (LLMs) in fresher, external documents. It helps with outdated knowledge and hallucinations, but it also opens two attack paths: the query itself and the corpus you retrieve from. New research on PIDP-Attack shows how hitting both paths at once is far more effective than either alone.

Prompt injection means adding instructions to a user prompt that override normal behaviour. Database poisoning means planting malicious documents in the retrieval corpus so they get pulled into context. The paper combines these into a single, query-agnostic attack that does not need to know what the user will ask.

How PIDP-Attack works

Offline, the attacker inserts a small number of poisoned passages into the corpus. Each starts with a chosen target question and text that points to a specific incorrect answer. Online, the attacker appends a fixed injection suffix to arbitrary user queries. That suffix encodes the same target question plus light instructions that nudge the model to prioritise it. The suffix increases the chance the retriever fetches the poisoned passages and biases the generator to output the attacker’s answer.

The setup assumes black-box access to a standard RAG pipeline, plus two modest capabilities: the ability to insert a few documents into the corpus and to modify or append to queries in transit. No access to model internals is required.

What the results show

Across three QA benchmarks (Natural Questions, HotpotQA, MS-MARCO) and eight instruction-following LLMs, PIDP-Attack outperforms single-surface baselines, including PoisonedRAG, GGPP, prompt-only injections, and query-agnostic poisoning. On Natural Questions, it lifts attack success rates by 4% to 16% over PoisonedRAG and shows consistent gains on MS-MARCO of roughly 5% to 12%. The authors report an average attack success rate of 98.125% across evaluated settings, with near-100% success on many models for HotpotQA, all while maintaining high retrieval precision. In short, the compound approach is both effective and stealthy.

Budget and context matter. On Natural Questions and HotpotQA, as few as two poisoned passages can exceed 95% success for certain models. MS-MARCO is noisier, and pushing above 90% often needs up to five poisoned passages. Increasing the context window can cut both ways: more top-k increases the chance of retrieving poisons but also dilutes their influence. The paper shows one model’s success dropping from 97% at k=5 to 82% at k=10.

Why this matters: many production QA and support bots already rely on cached, curated, or user-submitted knowledge. A small, stealthy compound attack can steer answers to arbitrary queries without advance knowledge of user intent, expanding risk beyond traditional prompt injection or poisoning alone.

Defence is an end-to-end job. The study’s limitations point to practical controls that blunt the attack surface.

  • Lock down the query path: authenticate clients, strip or reject anomalous suffixes, and isolate system instructions from user content.
  • Harden ingestion: require provenance for corpus updates, gate writes behind review, and block instruction-like content in knowledge bases.
  • Add runtime checks: scan retrieved chunks for instruction patterns or repeated target questions, and downgrade or exclude suspicious passages.

There are caveats. The attack relies on being able to tamper with queries in transit and to add documents to the corpus without strict authentication. Deployments that enforce provenance, sanitise inputs, and implement strong model refusal behaviour reduce success rates. The evaluation uses offline benchmarks rather than live systems. Still, the core insight is solid: when retrieval and generation are both in play, assume attackers will combine vectors. Future work on poisoning-resistant retrievers, stricter source validation, and cross-source verification looks like the right path.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Authors: Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang, Yongxin Guo, and Xiaoying Tang
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge and the tendency to generate hallucinations. To address these limitations, Retrieval-Augmented Generation (RAG) systems have been introduced, enhancing LLMs with external, up-to-date knowledge sources. Despite their advantages, RAG systems remain vulnerable to adversarial attacks, with data poisoning emerging as a prominent threat. Existing poisoning-based attacks typically require prior knowledge of the user's specific queries, limiting their flexibility and real-world applicability. In this work, we propose PIDP-Attack, a novel compound attack that integrates prompt injection with database poisoning in RAG. By appending malicious characters to queries at inference time and injecting a limited number of poisoned passages into the retrieval database, our method can effectively manipulate LLM response to arbitrary query without prior knowledge of the user's actual query. Experimental evaluations across three benchmark datasets (Natural Questions, HotpotQA, MS-MARCO) and eight LLMs demonstrate that PIDP-Attack consistently outperforms the original PoisonedRAG. Specifically, our method improves attack success rates by 4% to 16% on open-domain QA tasks while maintaining high retrieval precision, proving that the compound attack strategy is both necessary and highly effective.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies a composite adversary that targets Retrieval-Augmented Generation systems by combining prompt injection with database poisoning. RAG systems aim to ground large language models with external documents but increase the attack surface to include both the query pathway and the retrieval corpus. Prior poisoning attacks typically require knowledge of the victim's exact queries, which limits real‑world applicability. The authors ask whether a compound, query‑agnostic approach can reliably steer RAG outputs toward an attacker‑chosen incorrect answer without prior knowledge of user queries.

Approach

The proposed PIDP-Attack operates in two stages. Offline, the attacker injects a small set of poisoned passages into the retrieval corpus; each poisoned passage begins with the attacker’s chosen target question followed by supporting text that promotes a specific incorrect answer. Online, the attacker appends a fixed injection suffix to arbitrary user queries; the suffix encodes the attacker’s target question and lightweight override instructions. The injected query both steers retrieval toward the poisoned passages and biases the generator to prioritise answering the attacker's target. The method assumes black-box access to a RAG pipeline (embedding retriever plus LLM generator) and modest attacker capabilities: the ability to insert a few passages into the corpus and to modify or append to queries in transit. Evaluation uses three QA benchmarks (Natural Questions, HotpotQA, MS-MARCO), multiple instruction‑following LLMs, and standard embedding retrievers with default budgets n=5 poisoned passages and top-k=5 retrieved contexts unless varied in ablations.

Key Findings

  • Compound attack effectiveness: PIDP-Attack consistently outperforms single-surface baselines, including PoisonedRAG, GGPP, prompt‑only injections and query‑agnostic corpus poisoning. The compound method improves attack success rates by 4% to 16% on Natural Questions across seven of eight models and yields consistent gains on MS-MARCO (approximately +5% to +12%).
  • High overall success: The authors report an average attack success rate of 98.125% across evaluated settings and show that PIDP often attains ASR greater than 90% across many instruction‑following models. On HotpotQA PIDP matches or exceeds baselines and reaches near‑100% success on most models.
  • Budget and context sensitivity: Small poison budgets suffice on some corpora—n=2 already achieves >95% ASR for certain models on Natural Questions and HotpotQA—whereas MS-MARCO is more retrieval‑noisy and requires larger budgets (up to n=5 to exceed 90% ASR). Increasing the context budget (top‑k) can both increase poisoned recall and dilute poisoned influence; for example one model’s ASR falls from 97% at k=5 to 82% at k=10.

Limitations

PIDP-Attack depends on deployment conditions: it requires query strings to be modifiable in transit and corpus ingestion to accept unauthenticated passages. Effectiveness varies with retriever noise and model behaviour; retrieval‑limited, generation‑limited and dilution‑limited regimes produce failures. Models with strong refusal behaviour or deployments that isolate system instructions, strip anomalous suffixes, or enforce strict provenance for corpus updates significantly reduce attack success. The evaluation is conducted on benchmark corpora and offline experiments rather than live production systems.

Why It Matters

The work exposes a practical threat to services that ground LLM outputs on external corpora, including QA services and support systems. A small, stealthy compound attack can steer answers to arbitrary queries without knowing user intent in advance, amplifying risk relative to single‑vector attacks. Defences should be end‑to‑end: sanitise and authenticate query pathways, enforce provenance and ingestion controls for corpus updates, audit retrieved contexts for instruction‑like anomalies, and consider model‑level guardrails. The paper highlights detection hooks and defensive directions for hardening RAG pipelines.


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.