Google outlines continuous defences for indirect prompt injection
Enterprise
Indirect prompt injection is simple to describe and annoying to stop. An attacker hides instructions in data or tools that a Large Language Model (LLM) touches while answering a user. The model follows the hidden instructions, sometimes without any direct user input. Google’s latest write-up focuses on how Workspace with Gemini tries to stay ahead of this class of attack.
The headline is continuity. Google treats indirect prompt injection as an operational risk, not a bug to patch once. The company runs multiple discovery channels in parallel: human red teams simulate realistic abuse; automated red teaming generates and iterates on attacks at scale; a public AI Vulnerability Rewards Program brings in external researchers; and open-source intelligence monitoring pulls in newly disclosed techniques. That mix recognises the problem changes faster than any one team can cover.
Findings flow into a central vulnerability catalogue. Each issue is reproduced, de-duplicated, classified by technique and impact, and assigned to owners. That governance step sounds dull, but it is what turns scattered reports into engineering work that ships.
From there, the pipeline leans hard on synthetic data. Using Simula, Google expands each curated attack into variants to build broader training and test sets. The team reports a 75 percent increase in synthetic data generation, which feeds three layers of controls and the model itself.
Layered controls in practice
First, deterministic defences cover fast fixes: user confirmations, URL sanitisation, and tool-chaining policies enforced by a central Policy Engine. When a new pattern crops up, configuration updates and even regex takedowns can reduce exposure faster than a model refresh cycle.
Next, machine learning defences are retrained on partitioned synthetic data, with separate training and validation sets to keep evaluation repeatable. LLM-based defences are tuned by prompt engineering against agreed metrics. This keeps the guardrails aligned to the latest attack variants without assuming one prompt will hold up forever.
There is also model hardening. Gemini is trained to better spot and ignore harmful embedded instructions while still following the user’s intent. Using the fresh synthetic patterns, Google aims to reduce attack success rates without slowing routine operations.
To check if any of this works, Google runs end-to-end simulations across Workspace apps such as Gmail and Docs. Each defence change is measured with a before-and-after run, using standardised assets. That gives product teams concrete numbers on whether a prompt tweak, a policy change, or a model retrain actually moves the needle.
What enterprises can copy
- Stand up a vulnerability catalogue with clear ownership, technique and impact mapping, and strict reproducibility before remediation starts.
- Split defences into fast policy controls and slower ML or LLM updates so you can push point fixes while models catch up.
- Invest in synthetic variant generation and an end-to-end evaluation harness that runs the same attacks before and after each change.
This is not a magic shield. New attack styles will slip through, and relying on quick configuration changes is a pragmatic stopgap, not a cure. But the approach shows how to turn indirect prompt injection from an amorphous worry into a steady operational discipline. If you run AI across many data sources and tools, the pattern here is worth adopting: keep discovering, keep cataloguing, keep generating variants, and measure every change in the places users actually work.