Constrain agents, protect data, blunt prompt injection
Agents
Agents built on Large Language Models (LLMs) are creeping from demos into service desks, finance ops and developer tooling. The attack people keep hitting is prompt injection and social engineering: get the model to ignore instructions, do the wrong thing, or leak what it knows. A recent piece of work on ChatGPT-style agents takes a pragmatic line: reduce what the agent can do, and reduce what it can see.
Rather than another training recipe, the work focuses on two controls at the workflow level. First, constrain risky actions during a session so adversarial prompts cannot trigger harmful operations. Second, protect sensitive data inside the agent workflow with explicit rules on access, use and return. The aim is simple: fewer avenues for manipulation, and a smaller blast radius when something slips through.
Stop arguing with prompts; change the environment
This is the right priority. Security teams do not debate with malware; they remove its permissions. Treat agents the same. If an agent never gets the ability to send arbitrary emails, delete files or hit internal APIs without mediation, prompt injection has little to work with. Constraining capability beats hoping that a cleverly worded system prompt will survive contact with a wily attacker.
In practice, that means allowlisting tools and parameters, scoping them to the task, and keeping high-risk actions behind clear gates. A finance agent can draft a payment but not release it; a support agent can propose a data pull but not run an unrestricted query. Yes, this reduces autonomy. It also reduces incidents, and enterprises already live by least privilege everywhere else.
The same applies to data. Most leaks happen because we shove too much context into the agent and hope it behaves. Protecting sensitive data inside the workflow means minimising what is loaded, isolating it from unrelated steps, and being explicit about what the agent is allowed to return. You cannot exfiltrate what you never exposed.
The boring architecture that actually scales
The work is light on numbers. There is no detailed attack model, coverage analysis or performance trade-off. Mechanisms for enforcing these constraints are not specified. That is a gap, but it does not blunt the underlying argument. Architecture beats aspiration. If your safety case depends on the model always following instructions, you do not have a safety case.
For practitioners, the message is almost boring, which is why it matters. Make agent behaviour boring by design. Treat the agent as an untrusted user with a sharply defined role. Build the guardrails into the workflow, constrain its actions to what the business expects, and keep sensitive data on a short leash. The result will be less impressive demos and more trustworthy systems.
The open questions now are about enforcement detail and residual risk. How do we guarantee constraints survive orchestration bugs and tool migrations? What telemetry proves the guardrails are working? Those are operational problems security teams know how to solve. The industry should stop chasing prompt incantations and start doing the unglamorous engineering this work points to.